18
Winyu Chinthammit [email protected] Eric J. Seibel [email protected] Thomas A. Furness, III [email protected] Human Interface Technology Lab 215 Fluke Hall, Box 352142 University of Washington Seattle, WA 98195 Presence, Vol. 12, No. 1, February 2003, 1–18 © 2003 by the Massachusetts Institute of Technology A Shared-Aperture Tracking Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom (DOF) shared-aperture tracking system with image overlay is described. This unique tracking technology shares the same aperture or scanned optical beam with the visual display, virtual retinal display (VRD). This display technology provides high brightness in an AR helmet-mounted display, especially in the extreme environment of a military cock- pit. The VRD generates an image by optically scanning visible light directly to the viewer’s eye. By scanning both visible and infrared light, the head-worn display can be directly coupled to a head-tracking system. As a result, the proposed tracking system requires minimal calibration between the user’s viewpoint and the tracker’s viewpoint. This paper demonstrates that the proposed shared-aperture tracking system produces high accuracy and computational ef ciency. The current proof-of- concept system has a precision of 1/2 0.05 and 1/2 0.01 deg. in the horizontal and vertical axes, respectively. The static registration error was measured to be 0.08 1/2 0.04 and 0.03 1/2 0.02 deg. for the horizontal and vertical axes, respec- tively. The dynamic registration error or the system latency was measured to be within 16.67 ms, equivalent to our display refresh rate of 60 Hz. In all testing, the VRD was xed and the calibrated motion of a robot arm was tracked. By moving the robot arm within a restricted volume, this real-time shared-aperture method of tracking was extended to six-DOF measurements. Future AR applications of our shared-aperture tracking and display system will be highly accurate head tracking when the VRD is helmet mounted and worn within an enclosed space, such as an aircraft cockpit. 1 Introduction In immersive virtual reality (VR) applications, the computer renders all objects in the visual scene. In this case, the precise registration of the virtual images relative to the real world is not important. However, in augmented reality (AR) applications, the computer renders images or graphics information that augment or overlay a real scene, both of which are observed simulta- neously by the user. In this regard, precise alignment of the virtual image rela- tive to real objects is essential. The goal of AR is to enhance human interaction with the real world using computer-generated information that is related to the real-world objects. For example, bioengineering researchers have investi- gated using AR for image guidance during needle biopsy (Stetten, Chib, Hildebrand, & Bursee, 2001). The doctor can look directly at the patient while the AR system projects an ultrasound image over the real one, allowing for simultaneous examination of an ultrasound image and real image during Chinthammit et al. 1

A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

Winyu Chinthammitwinyuhitlwashingtonedu

Eric J Seibeleseibelhitlwashingtonedu

Thomas A Furness IIItfurnesshitlwashingtoneduHuman Interface Technology Lab

215 Fluke Hall Box 352142University of WashingtonSeattle WA 98195

Presence Vol 12 No 1 February 2003 1ndash18

copy 2003 by the Massachusetts Institute of Technology

A Shared-Aperture TrackingDisplay for Augmented Reality

Abstract

The operation and performance of a six degree-of-freedom (DOF) shared-aperture

tracking system with image overlay is described This unique tracking technologyshares the same aperture or scanned optical beam with the visual display virtualretinal display (VRD) This display technology provides high brightness in an AR

helmet-mounted display especially in the extreme environment of a military cock-pit The VRD generates an image by optically scanning visible light directly to theviewerrsquos eye By scanning both visible and infrared light the head-worn display can

be directly coupled to a head-tracking system As a result the proposed trackingsystem requires minimal calibration between the userrsquos viewpoint and the trackerrsquosviewpoint This paper demonstrates that the proposed shared-aperture tracking

system produces high accuracy and computational efciency The current proof-of-concept system has a precision of 12 005 and 12 001 deg in the horizontaland vertical axes respectively The static registration error was measured to be

008 12 004 and 003 12 002 deg for the horizontal and vertical axes respec-tively The dynamic registration error or the system latency was measured to bewithin 1667 ms equivalent to our display refresh rate of 60 Hz In all testing the

VRD was xed and the calibrated motion of a robot arm was tracked By movingthe robot arm within a restricted volume this real-time shared-aperture method oftracking was extended to six-DOF measurements Future AR applications of our

shared-aperture tracking and display system will be highly accurate head trackingwhen the VRD is helmet mounted and worn within an enclosed space such as anaircraft cockpit

1 Introduction

In immersive virtual reality (VR) applications the computer renders allobjects in the visual scene In this case the precise registration of the virtualimages relative to the real world is not important However in augmentedreality (AR) applications the computer renders images or graphics informationthat augment or overlay a real scene both of which are observed simulta-neously by the user In this regard precise alignment of the virtual image rela-tive to real objects is essential The goal of AR is to enhance human interactionwith the real world using computer-generated information that is related tothe real-world objects For example bioengineering researchers have investi-gated using AR for image guidance during needle biopsy (Stetten ChibHildebrand amp Bursee 2001) The doctor can look directly at the patientwhile the AR system projects an ultrasound image over the real one allowingfor simultaneous examination of an ultrasound image and real image during

Chinthammit et al 1

the operation Another augmented reality application isrealized in ghter cockpit head-up displays wherein in-formation related to the aircraft systems (such as naviga-tional waypoints) are superimposed and registeredthrough a see-through display over the real-world sceneas viewed through the display

To superimpose relevant information over the realworld AR applications may use a display that ismounted on an object in the real world (such as an au-tomobile dashboard or windshield) incorporated into ahandheld object (display wand) or mounted on theheadhelmet worn by the user Although these displaysmay present state or graphic information which is inde-pendent of where the virtual images appear in the realworld most AR applications require a precise stabiliza-tion of these images for superimposition relative to thereal world In those applications involving head-mounted displays the performance of the head-trackingsystem is a critical part of the image stabilization pro-cess It determines the userrsquos viewpoint so that the aug-menting virtual image is always aligned relative to thereal object

The main function of the head-tracking system is tosignal to the computer graphics generator the instanta-neous position and orientation of the headhelmet dis-play scene so that the computer system can generate andposition the graphics within that display eld of view(FOV) such that the objects appear stabilized in spaceSeveral tracking technologies are widely implemented inthe AR community (such as ducial or video trackingoptical tracking and inertial tracking) Each approachhas its own advantages and disadvantages For examplewithin a constrained environment the placement of -ducials within the physical environment has producedsingle-pixel tracking accuracy in several AR applications(Azuma 1997) However the ducial location in worldcoordinates must be known and high-accuracy trackingdepends on continuously maintaining a ducial withinthe instantaneous FOV of the tracking cameras Targettracking with a ducial system may also require signi-cant time to process the appropriate algorithms (You ampNeumann 2001)

To simplify the computational requirements for im-age processing optical beam scanning has been used to

track head orientation using synchronized detection inthe time domain (Rolland Davis amp Baillot 2001) Anadditional feature of optical beam scanning is the higherbandwidth of the individual optical detectors sensingtarget motion (Sorensen Donath Yang amp Starr1989) Thus landmark-tracking problems associatedwith motion blur due to the slow 30 ndash60 Hz videoframerates can be avoided (Yokokohji Sugawara amp Yo-skikawa 2000) But the setup time for optical beamscanning system can be extensive because such a systemrequires major installation within the environmentwhere the AR system will be used Another trackingtechnology is inertial tracking Inertial tracking technol-ogies are more self-contained than ducial tracking andoptical tracking systems and therefore the setup time isminimal However a major challenge in these systems isthe tendency to drift resulting in tracker errors in super-imposing the virtual images relative to real-world ob-jects (You Neumann amp Azuma 1999)

For AR to be effective the objects in the real and vir-tual environments must be properly aligned This align-ment process is called registration and it is one of thefundamental problems currently limiting AR applica-tions For example with inaccuracies in a head-trackingsystem users may perceive movement of a virtual cupon a real table while they walk around looking at thecup relative to the table This greatly compromises asense of reality that the user perceives and it compro-mises the utility of AR

The sources of the registration error can be dividedinto two types static and dynamic Intuitively static anddynamic errors occur when the user is stationary andwhen the user is in motion respectively The sources ofstatic errors are calibration error and tracker error Be-cause a typical AR system consists of tracking and dis-play subsystems calibration procedures are required toalign the viewpoint between the two A calibration errormeans that the coordinate system of the tracker is notaligned with the real world The primary sources of dy-namic errors are system latency and an invalid simulta-neity assumption of measurements The computationaltime and cameradisplay framerates are often the largestcontributors to the system latency

Creating a sense of reality requires not only an accu-

2 PRESENCE VOLUME 12 NUMBER 1

rate registration but also a high quality of the displayimage The luminance of augmented images must besufciently bright to be seen relative to the ambientlight from the real environment In a see-throughmode current AR display devices such as transmissionmode liquid crystal displays have insufcient luminanceto compete with light from the real environment As aresult a virtual image that is superimposed onto the realworld often appears to be ghostlike or low contrastSuch a problem can be signicant for military cockpitapplications wherein the pilot can encounter an extremerange of ambient lighting conditions Next we describea novel display that was invented in 1991 at the HumanInterface Technology Laboratory at the University ofWashington

This novel display provides a head-mounted display ofsufciently high luminance resolution and good formfactor (weight and size) for augmented reality applica-tions The display device is called the virtual retinal dis-play (VRD) The VRD scans laser light directly to theretina of the eye The VRD consists of photon genera-tors (for red green and blue light) light modulatorber optics coupler and a two-axis mechanical scannerwhich uses two oscillating mirrors to deect a beam ofmodulated laser light The beam is scanned through acombinerbeamsplitter and other optical elements toform a real image on the retina of the viewerrsquos eye Thelight beam is intensity-modulated while being scannedto render a two-dimensional image on the retina TheVRD is then perceived as a virtual image hovering inspace Even though only one pixel from the raster scanis being illuminated on the retina the viewer sees a sta-ble image if the framerate exceeds the critical icker fre-quency of 55ndash60 Hz (Kelly et al 2001) The lumi-nance of the VRD is capable of competing with thelight from the real environment thus we consider theVRD as an excellent alternative display device in ARapplications For a discussion on the past and currentengineering developments of the VRD technology seeJohnston and Willey (1995) and wwwmviscom

In a conventional VRD conguration the beamsplit-ter reects some energy (approximately 40) of visiblelight into the environment while the remaining light isforming a visible image on the userrsquos retina therefore

the beam is being scanned simultaneously to the eyeand into the environment If light-sensing detectors areplaced into the real-world environment where the VRDexternal beam is being scanned and given the knownraster-scanning pattern of the VRD then the orienta-tion and position of the VRD (or the userrsquos head) canbe detected at the same time an image is being pro-jected on the retina This artifact of the VRD suggeststhat it is feasible to incorporate a head-tracking func-tionality into the VRD In doing so an accurate regis-tration and a high quality of the display image can beaccomplished as we discuss later

We have constructed a working prototype of such ahead-tracking system that is directly coupled with theretinal display using the same optical pathway As a re-sult the proposed system requires minimal alignmentbetween the display coordinate system and the trackercoordinate system This signicantly reduces calibrationerror between the two Furthermore the required com-putational overhead can be low due to its hardware-based conguration (rather than the software-basedconguration of computer vision tracking systems) Thiscan reduce the overall system latency Another advan-tage of this approach is that tracking can be accom-plished with high precision because precision is limitedonly by the speed and accuracy of the data acquisitionsystem and not by the scanning resolution of the VRDUnlike computer vision techniques the z axis distance(that is that distance between the displaytracker to thesensing mechanism) does not affect precision

This paper describes the operation and performanceof a six-DOF shared-aperture tracking system with im-age overlay We have titled our system the shared aper-ture retinal scanning display and tracking system(SARSDT) In the ensuing sections we detail the oper-ation of the SARSDT and assess target-tracking perfor-mance (statically and dynamically) with image overlayover real-world objects The SARSDT is unique becausethe shared-aperture permits coupling of high-qualityvirtual information with real images and simultaneouslytracks the location of the virtual images within the real-world scene Because the optical tracker and displayshare the same aperture highly accurate and efcienttracking overlay methods can be applied in AR applica-

Chinthammit et al 3

tions (Kutulakos amp Vallino 1998 Kato amp Billinghurst1999) Furthermore when a single point within theFOV is tracked across the 2D projection of the 3Dscene simplied image-plane interaction techniques forAR become feasible (Pierce et al 1997) This shared-aperture display and tracking system can therefore allevi-ate shortcomings of current head-tracking systems asdiscussed previously

2 System Concept

We rst describe a fundamental concept of ourproposed shared-aperture retinal scanning display andtracking system It illustrates how the VRD optical pathcan be used simultaneously as an optical tracking sys-tem Then several detection techniques are describedIt explains what techniques we have implemented todetect an optical scanning beam Lastly we explain thetracking principles of our SARSDT system

21 Shared Aperture Retinal ScanningDisplay and Tracking System

Simply stated the SARSDT is a VRD that hasbeen modied to provide a tracking function by sharing

coaxially the optical path of the VRD (Figure 1 shows asimplied diagram of the system) Visible and IR lasersgenerate a multispectral beam of infrared and visiblelight The visible beam is modulated by the video signal(as in the normal VRD) and the IR beam is unmodu-lated The IR and visual beams are combined beforebeing manipulated by two mirrors which scan thebeams in a raster pattern A high-speed resonant scannerscans the beams back and forth in the horizontal axiswhile a galvanometer mirror scans the beams in the ver-tical axis (Note that in the case of the VRD the displayis generated by active raster lines that are scanned inboth directions left to right and right to left as opposedto normal NTSC raster lines which are only scanned ina left-to-right direction) To draw the raster the hori-zontal scanner moves the beam to draw a row of pixelsat approximately 1575 kHz then the vertical scannermoves the beam at VGA standards to the next linewhere another row of pixels is drawn with all lines re-freshed at 60 Hz

The scanned beam is then passed through a wave-length selective combinerbeamsplitter that separatesthe visible and IR components of the beams The visibleportion of the beam passes through the combiner and isreected by a spherical converging mirror to form anexit pupil through which the image is transferred

Figure 1 Simplied operation of the shared-aperture displaytracking system

4 PRESENCE VOLUME 12 NUMBER 1

through the entrance pupil of the eye to the retina TheIR portion of the beam is reected in the opposite di-rection from the beamsplitter resulting in a raster of IRlight being scanned within the same eld of view as thedisplay image

Because the VRD is mounted on headgear the out-going tracking scanning pattern is aligned with the dis-play scanning pattern Given the known raster scanningpattern of both the IR and visible beams we can deter-mine the instantaneous orientation and position of thedisplay scene using a set of infrared detectors installed inthe environment (for example the cockpit in gure 1)These detectors measure the instant when the scanningIR beam is coincident with each detector The proposedtracking system is based on measuring the timing (timeduration between measured events) within the rasterscan of the display

A timing measurement from the detector determinesthe position of the display raster relative to the positionof the detector in two dimensions If several detectorsare used the absolute position and orientation (for ex-ample six degrees of freedom) of the head relative tothe outside world can be calculated We describe in sub-section 22 some of the detection techniques we havedeveloped

22 Detection Techniques

The easiest way to determine the vertical orienta-tion of the display relative to the detectors is to countthe number of raster lines from the top of the displayframe to the instant a line passes a detector Given theinstantaneous vertical eld of view of the display theline count accurately determines how far down theframe the display has scanned when the detector isreached Because several scan lines may reach the detec-tor an analysis of successive scan lines allows the proces-sor to pick the middle line or interpolate between linesHowever the determination of the horizontal orienta-tion is a far more difcult issue The following para-graphs explain how an accurate determination of thehorizontal orientation is obtained

To determine horizontal orientation of the displayimage the detected IR pulse can be timed from the

start of each display frame as in the vertical case How-ever the exact horizontal scanning frequency in theVRDrsquos mechanical resonant scanner (MRS) is difcultto measure to a high degree of accuracy Therefore theresulting error in determining a horizontal locationwithin the overall raster scan (pixel error) dramaticallyincreases if the detected signal is referred to only thebeginning of the display frame This error accumulateswith each successive scan line Even though the methodof counting from the beginning of each frame is accu-rate enough for determining the vertical location as be-fore the horizontal location within a scan line needs tobe determined separately from the vertical location

A more straightforward method for determining thehorizontal location within a scan line is to refer to theMRS driving signal at the actual beginning of every scanline By counting the time from the beginning of thatline to the detection event the horizontal position ofthe display scene relative to the detector is determinedBut this assumes that there is stability between the ac-tual locations of the mirror location relative to the MRSdrive signal Unfortunately this may not be the caseWe have observed that the natural or resonant fre-quency of the MRS varies with environmental changessuch as temperature As a result the phase of the drivingsignal no longer matches the actual phase of the scan-ner

To negate the effect of such a phase drift the actualbeginning of the scan line is needed Our initial ap-proach to measure the actual beginning of the scan linewas to put an optical sensor at the edge of the scannedeld to detect the beginning of every scan line How-ever the performance was susceptible to slight mechani-cal misalignment between the sensor strip and the actualscanned eld

To overcome this problem we developed a dual de-tector approach (Refer to gure 2 and 3) Because mul-tiple IR scan lines are likely to be detected in eachscanned frame the time between each IR pulse deter-mines the general location within the scan line How-ever with one detector the direction of the horizontalscan when the beam rst hits the detector in any givenframe (such as going from left to right or vice versa) isarbitrary Therefore a second detector is placed to the

Chinthammit et al 5

side of the rst detector along the horizontal scan axisas illustrated in gure 2 creating multiple IR pulsesfrom each detector as shown in gure 3 Because thescanned beam strikes the two detectors sequentially thescan direction can be determined by the sequence ofdetection events

To obtain a precise horizontal location within thescan line one of the two detectors is chosen as the ref-erence As shown in gure 3 time detector A is the ref-erence detector and the timing regiment is when the IRbeam strikes detector A to when the returning beamalso strikes detector A We assume that this time dura-tion or measured pulse width is twice as long as the timefrom detector A to the scan edge or the actual begin-ning of the scan line The corresponding horizontal lo-cation is then calculated by halving this timing durationIn this way the beginning of the scan line is inferredand the location of the detector along the horizontalline relative to that beginning is determined Becauseseveral scan lines are likely to pass by the detector wecollect all of the events and pick the one in the middle(that is the middle of an odd set of scan lines or inter-polate if there are an even number) to determine thehorizontal location of the raster relative to the detector

To reiterate the timing from the beginning of theframe to the time when a beam hits detector A is usedto calculate the corresponding vertical location of theraster relative to the detector Timing from the intervalof successive passes of scan lines across the detectors isused to determine the horizontal location of the rasterrelative to the detector By knowing the location of thedetectors in the cockpit the relative orientation of thedisplay raster in azimuth and elevation angles can becalculated By locating three or more detectors in the

real-world environment that are within the instanta-neous external scan eld the position and orientation ofthe display raster in six degrees of freedom can be deter-mined This aspect is discussed in subsection 23

23 Tracking Principle

As stated a detectorrsquos 2D horizontal and verticallocation within the raster scan is the basis upon whichwe make all of our calculations of position and orienta-tion of the display raster in space We start by trackingin two dimensions in the time domain and then by cal-culation from multiple detectors relate that to trackingin the 3D spatial domain

231 Tracking in the Time Domain Track-ing in the time domain is dened in two dimensions as(pixel line) coordinates for horizontal and vertical loca-tions in the tracking raster respectively We use astraightforward way to calculate the corresponding pixeland line positions from the timing measurement ob-tained in the detection system For the pixel calculationthe horizontal timing is divided by a pixel clock (40nspixel) to obtain the corresponding pixel locationFor the line calculation the vertical timing is used todirectly calculate the corresponding line location by di-viding by the scan line time duration It is worth notingthat the error from not having an absolute scan linetime duration of the MRS is unlikely to have much ef-fect on the line accuracy This is because the truncatedresult does not likely reect the error Consequentiallywe can obtain a (pixel line) coordinate of the target(detector) in two dimensions

Even though 2D tracking in the time domain can beobtained via a single detector we are only capable ofsuperimposing an image or overlay in the direction ofthat single detector For example if the VRD is to su-perimpose an image slightly to the side of the detectorthe registration of the augmented image is changed inlocation slightly as the detector moves It is due to non-uniformity in pixel location and spacing within the dis-play FOV Consequently target-tracking applicationsare limited To fully use the SARSDT as an AR systemit is necessary to know where each pixel in the display

Figure 2 The scanned IR eld

6 PRESENCE VOLUME 12 NUMBER 1

actually overlays a location in the real worldmdashthe spatialinformation

232 Tracking in the Spatial Domain Tofully determine the intercept of each raster pixel (be-yond the single pixel detected previously) it is necessaryto determine the position of that raster pixel relative tothe real world By locating several detectors whose loca-tions in the real-world environment are known the po-sition and orientation of the display raster in six degreesof freedom can be determined

A fundamental problem of converting timing infor-mation into spatial information in the world coordinateis an orientation offset in three-dimensional space be-tween the IR scanned eld axes and the world axes(Chinthammit Seibel amp Furness 2002) The solutionto this problem is to determine a six-DOF transforma-tion between the world coordinates and the VRD coor-dinates

Generally this problem can be solved by triangulationtechniques in which the detectorsrsquo positions are known

relative to the real world Typically these techniquesrequire that distances from emittersource to each de-tector are known For example in the event of an earth-quake a seismograph can detect an earthquake magni-tude but not a precise direction and depth where theepicenter is located At least three seismographs areneeded to triangulate a precise epicenter see an exampleby Azuma and Ward (1991) Because our detection sys-tem is based on the timing incident in 2D raster scansingle-pixel detection cannot adequately determine thedistance Theoretically an infrared sensor unit can pos-sibly determine the distance from the signal strength ofthe received signal However the nature of our scan-ning eld causes an infrared beam to travel quickly overa sensing area of an infrared detector This makes it im-practical to accurately determine the distance from asignal strength level

This single-pixel detection problem in 2D coordinatescan commonly be addressed as a computer vision prob-lem such as a 3D reconstruction Therefore this trian-gulation problem can also be solved by computer vision

Figure 3 The pulse train from the dual-detector system

Chinthammit et al 7

techniques as well In our application a distance from aldquocamerardquo to each ldquomarkerrdquo is unknown whereas dis-tances of each marker (detector) relative to the referenceare known Our known markers are located in a 2D co-ordinate (pixel and line) through a captured videoframe The reconstruction performs on a principle that amarkerrsquos 3D position in the real world and the corre-sponding 2D coordinate on the camera screen must liealong the same ray formed by the camera projection Ifthe accurate 3D position estimations were found thedistance between markers would be equal to the physi-cal distances Generally the 3D position estimation issolved by an optimization process which can be timeconsuming to reach the optimal solution (Janin ZikanMizell Banner amp Sowizral 1994)

Another computer vision technique that requires lessoptimization process is to congure four markers intoeach vertex of the square Using the fact that there aretwo sets of parallel sides and their direction vectors per-pendicular to each other an image analysis algorithm isused to estimate the position and orientation of thecamera with respect to the square plane The optimiza-tion process is applied only to compensate for the errorin detected vertices coordinates that cause the directionvectors of the two parallel sides of the square not per-pendicular to each other Details on this developmentand algorithm are explained by Kato and Billinghurst(1999) Due to its simplicity we implement this imageanalysis technique on our hardware system to determinea six-DOF of detectorsrsquo plane with respect to the VRD

In gure 4 looking out from the VRD we dene areference 2D plane at a xed distance Zref which is animaginary distance away from the origin of the VRDcoordinate (The choice of Zref does not affect the over-all tracking accuracy) A projection of four vertices onour reference 2D plane represents vertices on the cam-era screen as implemented in the original algorithm(Kato amp Billinghurst 1999) Because a 2D coordinateis dened in the VRD coordinate converting timinginformation into spatial information becomes a muchsimpler issue The horizontal and vertical coordinate isdirectly obtained from (pixel line) coordinates in thetime domain Our 2D coordinates (or displacements) of

four vertices on that plane can be calculated by the fol-lowing equations

Displacementhorizontal ~t 5 2Kx

2 cos~2p freqx tx

(1)

Displacementvertical ~t 5 Ky z ty z freqY 2Ky

2

We dene subscripts (xy) as the directions along theMRSrsquos moving axis and the galvanometerrsquos moving axisrespectively Therefore (xy) can also be referred to asthe directions in the horizontal and vertical axes of theVRD scan The t is the timing measurement in bothaxes or (pixel line) coordinates from detections in timedomain Displacement is a calculated position in respec-tive axes although their calculations are derived differ-ently for horizontal and vertical axes The calculation ofthe horizontal displacement is modeled to correct forthe distortion of the sinusoidal nature of the MRSwhereas a linear model is used in the calculation of thevertical displacement due to a linearity of the galvanom-eter motion The displacement at the center of the scanis set to (00) for convenience purposes The K is thedisplacement over the entire FOV of the scanner in re-spective axes at the distance Zref The freq is the scan-

Figure 4 The illustration of how we dene a 2D plane in the VRD

coordinate

8 PRESENCE VOLUME 12 NUMBER 1

ning frequency of both scanners In conclusion equa-tion (1) transforms our 2D coordinates in time domainto 2D positions in the spatial domain

It is worth noting that this 2D position lies along thesame ray (formed by the scanner) with the actual 3Dposition in the real world Once the 2D position of eachvertex has been determined the image analysis algo-rithm can then determine the translation and orienta-tion of the square plane with respect to the scanner

3 Experimental Setting

This section rst describes all of the main appara-tuses in this experiment Then we describe our currentsystem conguration It is slightly different from whathave been described in subsection 21 in terms of amoving element Lastly performance metrics are de-ned as a way to characterize our proposed system

31 Experimental Apparatus

To characterize the utility and performance of theSARSDT concept we built a breadboard optical benchas shown in gure 7 Four subsystems compose the ex-perimental SARSDT a modied virtual retinal display arobot arm target mover a detection and data acquisi-tion subsystem and a reticle generator These sub-systems are described in more detail in the followingsubsections

311 The Virtual Retinal Display As stated inthe introduction the VRD scans laser light directly tothe retina of the eye to create augmented images It usestwo single-axis scanners to deect a beam of modulatedlaser light Currently the VRD scans at VGA format or640 3 480 pixel resolution The vertical scan rate isequivalent to the progressive frame rate Because the 60Hz vertical scan rate is relatively low an existing galva-nometer scanner is used as the vertical scanner (Cam-bridge Instruments) Galvanometers are capable of scan-ning through wide angles but at frequencies that aremuch lower than the required horizontal frequenciesEarly in our development of the VRD we built a me-

chanical resonant scanner (MRS) (Johnston amp Willey1995) to meet the horizontal scanner requirementsThe only moving part is a torsional springmirror com-bination used in a ux circuit Eliminating moving coilsor magnets (contained in existing mechanical resonantscanners) greatly lowers the rotational inertia of the de-vice thus a high operational frequency can be achievedAnother unique feature of the MRS is its size (the mir-ror is 2 3 6 mm) The entire scanner can be very smalland is being made smaller as a micro-electromechanicalsystem (MEMS) (Wine Helsel Jenkins Urey amp Os-born 2000)

To extend the eld of view of the horizontal scannerwe arrange the scanning mirrors to cause the beam toreect twice between the MRS and galvanometer mir-rors as shown in gure 5 This approach effectively mul-tiplies the beam deection by 4 relative to the actualrotational angle of the horizontal scanner

As shown in gure 6 the scanned beams passthrough a beamsplitter (extended hot mirror EdmundScientic) and spherical optical mirror to converge andcollimate the visible light at the exit pupil When theviewer aligns her eye entrance pupil with the exit pupilof the VRD she perceives a high-luminance display atoptical innity that is overlaid onto the natural view ofthe surround

312 Light Sources For convenience in theSARSDT breadboard system we use only one wave-length of visible light (red) and one IR wavelength Thevisible light source is a directly modulated red laser di-

Figure 5 Two scanning mirrors

Chinthammit et al 9

ode (635 nm Melles Griot Inc) that is ber opticallycoupled The IR laser diode (1310 nm Newport Corp)is not modulated Both visible and IR light are com-bined using an IR beroptic combiner (Newport Corp)into a single multispectral beam The result is a multiplespectra beam The collimation of the mixed wavelengthbeam is individually optimized for the IR and visible wave-length before being scanned by the two-axis scanner

313 Wavelength-Selective BeamsplitterFor the SARSDT the normal beamsplitter in the VRDconguration is replaced by a wavelength-selectivebeamsplitter or hot mirror This allows the red visiblewavelength to pass through the beamsplitter with lowattenuation while being highly reective to the infraredwavelength (1310 nm) As a result the wavelength-selective beamsplitter creates two simultaneous scans avisible one into the eye and an IR one into the environ-ment

314 Robot-Arm Target Mover A ve-DOFrobot arm (Eshed Robotec Scorbot-ER 4PC) is used totest the tracking performance of the system Because therst conguration of this concept is installed on an opti-cal bench it is difcult for us to move the display (aswould be the case with the eventual head-coupled ver-sion of the system) relative to xed detectors in the en-vironment Instead we decided to move the detectorsusing the robot arm In case of the target-tracking con-guration (see subsection 32) we use the robot arm tomove a detection circuit at the predetermined trajec-tory equivalent to moving the head in the opposite tra-jectory The robot arm provides a repeatable movementand reports the ldquotruthrdquo position The target mover sys-tem is independently operated from the tracking systemby a separate computer Robot arm positions in veDOF (no roll axis) are recorded at 8 Hz using customsoftware and an interface The reported positions canthen be used for evaluating tracking performances

Figure 6 Optical relay components

10 PRESENCE VOLUME 12 NUMBER 1

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 2: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

the operation Another augmented reality application isrealized in ghter cockpit head-up displays wherein in-formation related to the aircraft systems (such as naviga-tional waypoints) are superimposed and registeredthrough a see-through display over the real-world sceneas viewed through the display

To superimpose relevant information over the realworld AR applications may use a display that ismounted on an object in the real world (such as an au-tomobile dashboard or windshield) incorporated into ahandheld object (display wand) or mounted on theheadhelmet worn by the user Although these displaysmay present state or graphic information which is inde-pendent of where the virtual images appear in the realworld most AR applications require a precise stabiliza-tion of these images for superimposition relative to thereal world In those applications involving head-mounted displays the performance of the head-trackingsystem is a critical part of the image stabilization pro-cess It determines the userrsquos viewpoint so that the aug-menting virtual image is always aligned relative to thereal object

The main function of the head-tracking system is tosignal to the computer graphics generator the instanta-neous position and orientation of the headhelmet dis-play scene so that the computer system can generate andposition the graphics within that display eld of view(FOV) such that the objects appear stabilized in spaceSeveral tracking technologies are widely implemented inthe AR community (such as ducial or video trackingoptical tracking and inertial tracking) Each approachhas its own advantages and disadvantages For examplewithin a constrained environment the placement of -ducials within the physical environment has producedsingle-pixel tracking accuracy in several AR applications(Azuma 1997) However the ducial location in worldcoordinates must be known and high-accuracy trackingdepends on continuously maintaining a ducial withinthe instantaneous FOV of the tracking cameras Targettracking with a ducial system may also require signi-cant time to process the appropriate algorithms (You ampNeumann 2001)

To simplify the computational requirements for im-age processing optical beam scanning has been used to

track head orientation using synchronized detection inthe time domain (Rolland Davis amp Baillot 2001) Anadditional feature of optical beam scanning is the higherbandwidth of the individual optical detectors sensingtarget motion (Sorensen Donath Yang amp Starr1989) Thus landmark-tracking problems associatedwith motion blur due to the slow 30 ndash60 Hz videoframerates can be avoided (Yokokohji Sugawara amp Yo-skikawa 2000) But the setup time for optical beamscanning system can be extensive because such a systemrequires major installation within the environmentwhere the AR system will be used Another trackingtechnology is inertial tracking Inertial tracking technol-ogies are more self-contained than ducial tracking andoptical tracking systems and therefore the setup time isminimal However a major challenge in these systems isthe tendency to drift resulting in tracker errors in super-imposing the virtual images relative to real-world ob-jects (You Neumann amp Azuma 1999)

For AR to be effective the objects in the real and vir-tual environments must be properly aligned This align-ment process is called registration and it is one of thefundamental problems currently limiting AR applica-tions For example with inaccuracies in a head-trackingsystem users may perceive movement of a virtual cupon a real table while they walk around looking at thecup relative to the table This greatly compromises asense of reality that the user perceives and it compro-mises the utility of AR

The sources of the registration error can be dividedinto two types static and dynamic Intuitively static anddynamic errors occur when the user is stationary andwhen the user is in motion respectively The sources ofstatic errors are calibration error and tracker error Be-cause a typical AR system consists of tracking and dis-play subsystems calibration procedures are required toalign the viewpoint between the two A calibration errormeans that the coordinate system of the tracker is notaligned with the real world The primary sources of dy-namic errors are system latency and an invalid simulta-neity assumption of measurements The computationaltime and cameradisplay framerates are often the largestcontributors to the system latency

Creating a sense of reality requires not only an accu-

2 PRESENCE VOLUME 12 NUMBER 1

rate registration but also a high quality of the displayimage The luminance of augmented images must besufciently bright to be seen relative to the ambientlight from the real environment In a see-throughmode current AR display devices such as transmissionmode liquid crystal displays have insufcient luminanceto compete with light from the real environment As aresult a virtual image that is superimposed onto the realworld often appears to be ghostlike or low contrastSuch a problem can be signicant for military cockpitapplications wherein the pilot can encounter an extremerange of ambient lighting conditions Next we describea novel display that was invented in 1991 at the HumanInterface Technology Laboratory at the University ofWashington

This novel display provides a head-mounted display ofsufciently high luminance resolution and good formfactor (weight and size) for augmented reality applica-tions The display device is called the virtual retinal dis-play (VRD) The VRD scans laser light directly to theretina of the eye The VRD consists of photon genera-tors (for red green and blue light) light modulatorber optics coupler and a two-axis mechanical scannerwhich uses two oscillating mirrors to deect a beam ofmodulated laser light The beam is scanned through acombinerbeamsplitter and other optical elements toform a real image on the retina of the viewerrsquos eye Thelight beam is intensity-modulated while being scannedto render a two-dimensional image on the retina TheVRD is then perceived as a virtual image hovering inspace Even though only one pixel from the raster scanis being illuminated on the retina the viewer sees a sta-ble image if the framerate exceeds the critical icker fre-quency of 55ndash60 Hz (Kelly et al 2001) The lumi-nance of the VRD is capable of competing with thelight from the real environment thus we consider theVRD as an excellent alternative display device in ARapplications For a discussion on the past and currentengineering developments of the VRD technology seeJohnston and Willey (1995) and wwwmviscom

In a conventional VRD conguration the beamsplit-ter reects some energy (approximately 40) of visiblelight into the environment while the remaining light isforming a visible image on the userrsquos retina therefore

the beam is being scanned simultaneously to the eyeand into the environment If light-sensing detectors areplaced into the real-world environment where the VRDexternal beam is being scanned and given the knownraster-scanning pattern of the VRD then the orienta-tion and position of the VRD (or the userrsquos head) canbe detected at the same time an image is being pro-jected on the retina This artifact of the VRD suggeststhat it is feasible to incorporate a head-tracking func-tionality into the VRD In doing so an accurate regis-tration and a high quality of the display image can beaccomplished as we discuss later

We have constructed a working prototype of such ahead-tracking system that is directly coupled with theretinal display using the same optical pathway As a re-sult the proposed system requires minimal alignmentbetween the display coordinate system and the trackercoordinate system This signicantly reduces calibrationerror between the two Furthermore the required com-putational overhead can be low due to its hardware-based conguration (rather than the software-basedconguration of computer vision tracking systems) Thiscan reduce the overall system latency Another advan-tage of this approach is that tracking can be accom-plished with high precision because precision is limitedonly by the speed and accuracy of the data acquisitionsystem and not by the scanning resolution of the VRDUnlike computer vision techniques the z axis distance(that is that distance between the displaytracker to thesensing mechanism) does not affect precision

This paper describes the operation and performanceof a six-DOF shared-aperture tracking system with im-age overlay We have titled our system the shared aper-ture retinal scanning display and tracking system(SARSDT) In the ensuing sections we detail the oper-ation of the SARSDT and assess target-tracking perfor-mance (statically and dynamically) with image overlayover real-world objects The SARSDT is unique becausethe shared-aperture permits coupling of high-qualityvirtual information with real images and simultaneouslytracks the location of the virtual images within the real-world scene Because the optical tracker and displayshare the same aperture highly accurate and efcienttracking overlay methods can be applied in AR applica-

Chinthammit et al 3

tions (Kutulakos amp Vallino 1998 Kato amp Billinghurst1999) Furthermore when a single point within theFOV is tracked across the 2D projection of the 3Dscene simplied image-plane interaction techniques forAR become feasible (Pierce et al 1997) This shared-aperture display and tracking system can therefore allevi-ate shortcomings of current head-tracking systems asdiscussed previously

2 System Concept

We rst describe a fundamental concept of ourproposed shared-aperture retinal scanning display andtracking system It illustrates how the VRD optical pathcan be used simultaneously as an optical tracking sys-tem Then several detection techniques are describedIt explains what techniques we have implemented todetect an optical scanning beam Lastly we explain thetracking principles of our SARSDT system

21 Shared Aperture Retinal ScanningDisplay and Tracking System

Simply stated the SARSDT is a VRD that hasbeen modied to provide a tracking function by sharing

coaxially the optical path of the VRD (Figure 1 shows asimplied diagram of the system) Visible and IR lasersgenerate a multispectral beam of infrared and visiblelight The visible beam is modulated by the video signal(as in the normal VRD) and the IR beam is unmodu-lated The IR and visual beams are combined beforebeing manipulated by two mirrors which scan thebeams in a raster pattern A high-speed resonant scannerscans the beams back and forth in the horizontal axiswhile a galvanometer mirror scans the beams in the ver-tical axis (Note that in the case of the VRD the displayis generated by active raster lines that are scanned inboth directions left to right and right to left as opposedto normal NTSC raster lines which are only scanned ina left-to-right direction) To draw the raster the hori-zontal scanner moves the beam to draw a row of pixelsat approximately 1575 kHz then the vertical scannermoves the beam at VGA standards to the next linewhere another row of pixels is drawn with all lines re-freshed at 60 Hz

The scanned beam is then passed through a wave-length selective combinerbeamsplitter that separatesthe visible and IR components of the beams The visibleportion of the beam passes through the combiner and isreected by a spherical converging mirror to form anexit pupil through which the image is transferred

Figure 1 Simplied operation of the shared-aperture displaytracking system

4 PRESENCE VOLUME 12 NUMBER 1

through the entrance pupil of the eye to the retina TheIR portion of the beam is reected in the opposite di-rection from the beamsplitter resulting in a raster of IRlight being scanned within the same eld of view as thedisplay image

Because the VRD is mounted on headgear the out-going tracking scanning pattern is aligned with the dis-play scanning pattern Given the known raster scanningpattern of both the IR and visible beams we can deter-mine the instantaneous orientation and position of thedisplay scene using a set of infrared detectors installed inthe environment (for example the cockpit in gure 1)These detectors measure the instant when the scanningIR beam is coincident with each detector The proposedtracking system is based on measuring the timing (timeduration between measured events) within the rasterscan of the display

A timing measurement from the detector determinesthe position of the display raster relative to the positionof the detector in two dimensions If several detectorsare used the absolute position and orientation (for ex-ample six degrees of freedom) of the head relative tothe outside world can be calculated We describe in sub-section 22 some of the detection techniques we havedeveloped

22 Detection Techniques

The easiest way to determine the vertical orienta-tion of the display relative to the detectors is to countthe number of raster lines from the top of the displayframe to the instant a line passes a detector Given theinstantaneous vertical eld of view of the display theline count accurately determines how far down theframe the display has scanned when the detector isreached Because several scan lines may reach the detec-tor an analysis of successive scan lines allows the proces-sor to pick the middle line or interpolate between linesHowever the determination of the horizontal orienta-tion is a far more difcult issue The following para-graphs explain how an accurate determination of thehorizontal orientation is obtained

To determine horizontal orientation of the displayimage the detected IR pulse can be timed from the

start of each display frame as in the vertical case How-ever the exact horizontal scanning frequency in theVRDrsquos mechanical resonant scanner (MRS) is difcultto measure to a high degree of accuracy Therefore theresulting error in determining a horizontal locationwithin the overall raster scan (pixel error) dramaticallyincreases if the detected signal is referred to only thebeginning of the display frame This error accumulateswith each successive scan line Even though the methodof counting from the beginning of each frame is accu-rate enough for determining the vertical location as be-fore the horizontal location within a scan line needs tobe determined separately from the vertical location

A more straightforward method for determining thehorizontal location within a scan line is to refer to theMRS driving signal at the actual beginning of every scanline By counting the time from the beginning of thatline to the detection event the horizontal position ofthe display scene relative to the detector is determinedBut this assumes that there is stability between the ac-tual locations of the mirror location relative to the MRSdrive signal Unfortunately this may not be the caseWe have observed that the natural or resonant fre-quency of the MRS varies with environmental changessuch as temperature As a result the phase of the drivingsignal no longer matches the actual phase of the scan-ner

To negate the effect of such a phase drift the actualbeginning of the scan line is needed Our initial ap-proach to measure the actual beginning of the scan linewas to put an optical sensor at the edge of the scannedeld to detect the beginning of every scan line How-ever the performance was susceptible to slight mechani-cal misalignment between the sensor strip and the actualscanned eld

To overcome this problem we developed a dual de-tector approach (Refer to gure 2 and 3) Because mul-tiple IR scan lines are likely to be detected in eachscanned frame the time between each IR pulse deter-mines the general location within the scan line How-ever with one detector the direction of the horizontalscan when the beam rst hits the detector in any givenframe (such as going from left to right or vice versa) isarbitrary Therefore a second detector is placed to the

Chinthammit et al 5

side of the rst detector along the horizontal scan axisas illustrated in gure 2 creating multiple IR pulsesfrom each detector as shown in gure 3 Because thescanned beam strikes the two detectors sequentially thescan direction can be determined by the sequence ofdetection events

To obtain a precise horizontal location within thescan line one of the two detectors is chosen as the ref-erence As shown in gure 3 time detector A is the ref-erence detector and the timing regiment is when the IRbeam strikes detector A to when the returning beamalso strikes detector A We assume that this time dura-tion or measured pulse width is twice as long as the timefrom detector A to the scan edge or the actual begin-ning of the scan line The corresponding horizontal lo-cation is then calculated by halving this timing durationIn this way the beginning of the scan line is inferredand the location of the detector along the horizontalline relative to that beginning is determined Becauseseveral scan lines are likely to pass by the detector wecollect all of the events and pick the one in the middle(that is the middle of an odd set of scan lines or inter-polate if there are an even number) to determine thehorizontal location of the raster relative to the detector

To reiterate the timing from the beginning of theframe to the time when a beam hits detector A is usedto calculate the corresponding vertical location of theraster relative to the detector Timing from the intervalof successive passes of scan lines across the detectors isused to determine the horizontal location of the rasterrelative to the detector By knowing the location of thedetectors in the cockpit the relative orientation of thedisplay raster in azimuth and elevation angles can becalculated By locating three or more detectors in the

real-world environment that are within the instanta-neous external scan eld the position and orientation ofthe display raster in six degrees of freedom can be deter-mined This aspect is discussed in subsection 23

23 Tracking Principle

As stated a detectorrsquos 2D horizontal and verticallocation within the raster scan is the basis upon whichwe make all of our calculations of position and orienta-tion of the display raster in space We start by trackingin two dimensions in the time domain and then by cal-culation from multiple detectors relate that to trackingin the 3D spatial domain

231 Tracking in the Time Domain Track-ing in the time domain is dened in two dimensions as(pixel line) coordinates for horizontal and vertical loca-tions in the tracking raster respectively We use astraightforward way to calculate the corresponding pixeland line positions from the timing measurement ob-tained in the detection system For the pixel calculationthe horizontal timing is divided by a pixel clock (40nspixel) to obtain the corresponding pixel locationFor the line calculation the vertical timing is used todirectly calculate the corresponding line location by di-viding by the scan line time duration It is worth notingthat the error from not having an absolute scan linetime duration of the MRS is unlikely to have much ef-fect on the line accuracy This is because the truncatedresult does not likely reect the error Consequentiallywe can obtain a (pixel line) coordinate of the target(detector) in two dimensions

Even though 2D tracking in the time domain can beobtained via a single detector we are only capable ofsuperimposing an image or overlay in the direction ofthat single detector For example if the VRD is to su-perimpose an image slightly to the side of the detectorthe registration of the augmented image is changed inlocation slightly as the detector moves It is due to non-uniformity in pixel location and spacing within the dis-play FOV Consequently target-tracking applicationsare limited To fully use the SARSDT as an AR systemit is necessary to know where each pixel in the display

Figure 2 The scanned IR eld

6 PRESENCE VOLUME 12 NUMBER 1

actually overlays a location in the real worldmdashthe spatialinformation

232 Tracking in the Spatial Domain Tofully determine the intercept of each raster pixel (be-yond the single pixel detected previously) it is necessaryto determine the position of that raster pixel relative tothe real world By locating several detectors whose loca-tions in the real-world environment are known the po-sition and orientation of the display raster in six degreesof freedom can be determined

A fundamental problem of converting timing infor-mation into spatial information in the world coordinateis an orientation offset in three-dimensional space be-tween the IR scanned eld axes and the world axes(Chinthammit Seibel amp Furness 2002) The solutionto this problem is to determine a six-DOF transforma-tion between the world coordinates and the VRD coor-dinates

Generally this problem can be solved by triangulationtechniques in which the detectorsrsquo positions are known

relative to the real world Typically these techniquesrequire that distances from emittersource to each de-tector are known For example in the event of an earth-quake a seismograph can detect an earthquake magni-tude but not a precise direction and depth where theepicenter is located At least three seismographs areneeded to triangulate a precise epicenter see an exampleby Azuma and Ward (1991) Because our detection sys-tem is based on the timing incident in 2D raster scansingle-pixel detection cannot adequately determine thedistance Theoretically an infrared sensor unit can pos-sibly determine the distance from the signal strength ofthe received signal However the nature of our scan-ning eld causes an infrared beam to travel quickly overa sensing area of an infrared detector This makes it im-practical to accurately determine the distance from asignal strength level

This single-pixel detection problem in 2D coordinatescan commonly be addressed as a computer vision prob-lem such as a 3D reconstruction Therefore this trian-gulation problem can also be solved by computer vision

Figure 3 The pulse train from the dual-detector system

Chinthammit et al 7

techniques as well In our application a distance from aldquocamerardquo to each ldquomarkerrdquo is unknown whereas dis-tances of each marker (detector) relative to the referenceare known Our known markers are located in a 2D co-ordinate (pixel and line) through a captured videoframe The reconstruction performs on a principle that amarkerrsquos 3D position in the real world and the corre-sponding 2D coordinate on the camera screen must liealong the same ray formed by the camera projection Ifthe accurate 3D position estimations were found thedistance between markers would be equal to the physi-cal distances Generally the 3D position estimation issolved by an optimization process which can be timeconsuming to reach the optimal solution (Janin ZikanMizell Banner amp Sowizral 1994)

Another computer vision technique that requires lessoptimization process is to congure four markers intoeach vertex of the square Using the fact that there aretwo sets of parallel sides and their direction vectors per-pendicular to each other an image analysis algorithm isused to estimate the position and orientation of thecamera with respect to the square plane The optimiza-tion process is applied only to compensate for the errorin detected vertices coordinates that cause the directionvectors of the two parallel sides of the square not per-pendicular to each other Details on this developmentand algorithm are explained by Kato and Billinghurst(1999) Due to its simplicity we implement this imageanalysis technique on our hardware system to determinea six-DOF of detectorsrsquo plane with respect to the VRD

In gure 4 looking out from the VRD we dene areference 2D plane at a xed distance Zref which is animaginary distance away from the origin of the VRDcoordinate (The choice of Zref does not affect the over-all tracking accuracy) A projection of four vertices onour reference 2D plane represents vertices on the cam-era screen as implemented in the original algorithm(Kato amp Billinghurst 1999) Because a 2D coordinateis dened in the VRD coordinate converting timinginformation into spatial information becomes a muchsimpler issue The horizontal and vertical coordinate isdirectly obtained from (pixel line) coordinates in thetime domain Our 2D coordinates (or displacements) of

four vertices on that plane can be calculated by the fol-lowing equations

Displacementhorizontal ~t 5 2Kx

2 cos~2p freqx tx

(1)

Displacementvertical ~t 5 Ky z ty z freqY 2Ky

2

We dene subscripts (xy) as the directions along theMRSrsquos moving axis and the galvanometerrsquos moving axisrespectively Therefore (xy) can also be referred to asthe directions in the horizontal and vertical axes of theVRD scan The t is the timing measurement in bothaxes or (pixel line) coordinates from detections in timedomain Displacement is a calculated position in respec-tive axes although their calculations are derived differ-ently for horizontal and vertical axes The calculation ofthe horizontal displacement is modeled to correct forthe distortion of the sinusoidal nature of the MRSwhereas a linear model is used in the calculation of thevertical displacement due to a linearity of the galvanom-eter motion The displacement at the center of the scanis set to (00) for convenience purposes The K is thedisplacement over the entire FOV of the scanner in re-spective axes at the distance Zref The freq is the scan-

Figure 4 The illustration of how we dene a 2D plane in the VRD

coordinate

8 PRESENCE VOLUME 12 NUMBER 1

ning frequency of both scanners In conclusion equa-tion (1) transforms our 2D coordinates in time domainto 2D positions in the spatial domain

It is worth noting that this 2D position lies along thesame ray (formed by the scanner) with the actual 3Dposition in the real world Once the 2D position of eachvertex has been determined the image analysis algo-rithm can then determine the translation and orienta-tion of the square plane with respect to the scanner

3 Experimental Setting

This section rst describes all of the main appara-tuses in this experiment Then we describe our currentsystem conguration It is slightly different from whathave been described in subsection 21 in terms of amoving element Lastly performance metrics are de-ned as a way to characterize our proposed system

31 Experimental Apparatus

To characterize the utility and performance of theSARSDT concept we built a breadboard optical benchas shown in gure 7 Four subsystems compose the ex-perimental SARSDT a modied virtual retinal display arobot arm target mover a detection and data acquisi-tion subsystem and a reticle generator These sub-systems are described in more detail in the followingsubsections

311 The Virtual Retinal Display As stated inthe introduction the VRD scans laser light directly tothe retina of the eye to create augmented images It usestwo single-axis scanners to deect a beam of modulatedlaser light Currently the VRD scans at VGA format or640 3 480 pixel resolution The vertical scan rate isequivalent to the progressive frame rate Because the 60Hz vertical scan rate is relatively low an existing galva-nometer scanner is used as the vertical scanner (Cam-bridge Instruments) Galvanometers are capable of scan-ning through wide angles but at frequencies that aremuch lower than the required horizontal frequenciesEarly in our development of the VRD we built a me-

chanical resonant scanner (MRS) (Johnston amp Willey1995) to meet the horizontal scanner requirementsThe only moving part is a torsional springmirror com-bination used in a ux circuit Eliminating moving coilsor magnets (contained in existing mechanical resonantscanners) greatly lowers the rotational inertia of the de-vice thus a high operational frequency can be achievedAnother unique feature of the MRS is its size (the mir-ror is 2 3 6 mm) The entire scanner can be very smalland is being made smaller as a micro-electromechanicalsystem (MEMS) (Wine Helsel Jenkins Urey amp Os-born 2000)

To extend the eld of view of the horizontal scannerwe arrange the scanning mirrors to cause the beam toreect twice between the MRS and galvanometer mir-rors as shown in gure 5 This approach effectively mul-tiplies the beam deection by 4 relative to the actualrotational angle of the horizontal scanner

As shown in gure 6 the scanned beams passthrough a beamsplitter (extended hot mirror EdmundScientic) and spherical optical mirror to converge andcollimate the visible light at the exit pupil When theviewer aligns her eye entrance pupil with the exit pupilof the VRD she perceives a high-luminance display atoptical innity that is overlaid onto the natural view ofthe surround

312 Light Sources For convenience in theSARSDT breadboard system we use only one wave-length of visible light (red) and one IR wavelength Thevisible light source is a directly modulated red laser di-

Figure 5 Two scanning mirrors

Chinthammit et al 9

ode (635 nm Melles Griot Inc) that is ber opticallycoupled The IR laser diode (1310 nm Newport Corp)is not modulated Both visible and IR light are com-bined using an IR beroptic combiner (Newport Corp)into a single multispectral beam The result is a multiplespectra beam The collimation of the mixed wavelengthbeam is individually optimized for the IR and visible wave-length before being scanned by the two-axis scanner

313 Wavelength-Selective BeamsplitterFor the SARSDT the normal beamsplitter in the VRDconguration is replaced by a wavelength-selectivebeamsplitter or hot mirror This allows the red visiblewavelength to pass through the beamsplitter with lowattenuation while being highly reective to the infraredwavelength (1310 nm) As a result the wavelength-selective beamsplitter creates two simultaneous scans avisible one into the eye and an IR one into the environ-ment

314 Robot-Arm Target Mover A ve-DOFrobot arm (Eshed Robotec Scorbot-ER 4PC) is used totest the tracking performance of the system Because therst conguration of this concept is installed on an opti-cal bench it is difcult for us to move the display (aswould be the case with the eventual head-coupled ver-sion of the system) relative to xed detectors in the en-vironment Instead we decided to move the detectorsusing the robot arm In case of the target-tracking con-guration (see subsection 32) we use the robot arm tomove a detection circuit at the predetermined trajec-tory equivalent to moving the head in the opposite tra-jectory The robot arm provides a repeatable movementand reports the ldquotruthrdquo position The target mover sys-tem is independently operated from the tracking systemby a separate computer Robot arm positions in veDOF (no roll axis) are recorded at 8 Hz using customsoftware and an interface The reported positions canthen be used for evaluating tracking performances

Figure 6 Optical relay components

10 PRESENCE VOLUME 12 NUMBER 1

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 3: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

rate registration but also a high quality of the displayimage The luminance of augmented images must besufciently bright to be seen relative to the ambientlight from the real environment In a see-throughmode current AR display devices such as transmissionmode liquid crystal displays have insufcient luminanceto compete with light from the real environment As aresult a virtual image that is superimposed onto the realworld often appears to be ghostlike or low contrastSuch a problem can be signicant for military cockpitapplications wherein the pilot can encounter an extremerange of ambient lighting conditions Next we describea novel display that was invented in 1991 at the HumanInterface Technology Laboratory at the University ofWashington

This novel display provides a head-mounted display ofsufciently high luminance resolution and good formfactor (weight and size) for augmented reality applica-tions The display device is called the virtual retinal dis-play (VRD) The VRD scans laser light directly to theretina of the eye The VRD consists of photon genera-tors (for red green and blue light) light modulatorber optics coupler and a two-axis mechanical scannerwhich uses two oscillating mirrors to deect a beam ofmodulated laser light The beam is scanned through acombinerbeamsplitter and other optical elements toform a real image on the retina of the viewerrsquos eye Thelight beam is intensity-modulated while being scannedto render a two-dimensional image on the retina TheVRD is then perceived as a virtual image hovering inspace Even though only one pixel from the raster scanis being illuminated on the retina the viewer sees a sta-ble image if the framerate exceeds the critical icker fre-quency of 55ndash60 Hz (Kelly et al 2001) The lumi-nance of the VRD is capable of competing with thelight from the real environment thus we consider theVRD as an excellent alternative display device in ARapplications For a discussion on the past and currentengineering developments of the VRD technology seeJohnston and Willey (1995) and wwwmviscom

In a conventional VRD conguration the beamsplit-ter reects some energy (approximately 40) of visiblelight into the environment while the remaining light isforming a visible image on the userrsquos retina therefore

the beam is being scanned simultaneously to the eyeand into the environment If light-sensing detectors areplaced into the real-world environment where the VRDexternal beam is being scanned and given the knownraster-scanning pattern of the VRD then the orienta-tion and position of the VRD (or the userrsquos head) canbe detected at the same time an image is being pro-jected on the retina This artifact of the VRD suggeststhat it is feasible to incorporate a head-tracking func-tionality into the VRD In doing so an accurate regis-tration and a high quality of the display image can beaccomplished as we discuss later

We have constructed a working prototype of such ahead-tracking system that is directly coupled with theretinal display using the same optical pathway As a re-sult the proposed system requires minimal alignmentbetween the display coordinate system and the trackercoordinate system This signicantly reduces calibrationerror between the two Furthermore the required com-putational overhead can be low due to its hardware-based conguration (rather than the software-basedconguration of computer vision tracking systems) Thiscan reduce the overall system latency Another advan-tage of this approach is that tracking can be accom-plished with high precision because precision is limitedonly by the speed and accuracy of the data acquisitionsystem and not by the scanning resolution of the VRDUnlike computer vision techniques the z axis distance(that is that distance between the displaytracker to thesensing mechanism) does not affect precision

This paper describes the operation and performanceof a six-DOF shared-aperture tracking system with im-age overlay We have titled our system the shared aper-ture retinal scanning display and tracking system(SARSDT) In the ensuing sections we detail the oper-ation of the SARSDT and assess target-tracking perfor-mance (statically and dynamically) with image overlayover real-world objects The SARSDT is unique becausethe shared-aperture permits coupling of high-qualityvirtual information with real images and simultaneouslytracks the location of the virtual images within the real-world scene Because the optical tracker and displayshare the same aperture highly accurate and efcienttracking overlay methods can be applied in AR applica-

Chinthammit et al 3

tions (Kutulakos amp Vallino 1998 Kato amp Billinghurst1999) Furthermore when a single point within theFOV is tracked across the 2D projection of the 3Dscene simplied image-plane interaction techniques forAR become feasible (Pierce et al 1997) This shared-aperture display and tracking system can therefore allevi-ate shortcomings of current head-tracking systems asdiscussed previously

2 System Concept

We rst describe a fundamental concept of ourproposed shared-aperture retinal scanning display andtracking system It illustrates how the VRD optical pathcan be used simultaneously as an optical tracking sys-tem Then several detection techniques are describedIt explains what techniques we have implemented todetect an optical scanning beam Lastly we explain thetracking principles of our SARSDT system

21 Shared Aperture Retinal ScanningDisplay and Tracking System

Simply stated the SARSDT is a VRD that hasbeen modied to provide a tracking function by sharing

coaxially the optical path of the VRD (Figure 1 shows asimplied diagram of the system) Visible and IR lasersgenerate a multispectral beam of infrared and visiblelight The visible beam is modulated by the video signal(as in the normal VRD) and the IR beam is unmodu-lated The IR and visual beams are combined beforebeing manipulated by two mirrors which scan thebeams in a raster pattern A high-speed resonant scannerscans the beams back and forth in the horizontal axiswhile a galvanometer mirror scans the beams in the ver-tical axis (Note that in the case of the VRD the displayis generated by active raster lines that are scanned inboth directions left to right and right to left as opposedto normal NTSC raster lines which are only scanned ina left-to-right direction) To draw the raster the hori-zontal scanner moves the beam to draw a row of pixelsat approximately 1575 kHz then the vertical scannermoves the beam at VGA standards to the next linewhere another row of pixels is drawn with all lines re-freshed at 60 Hz

The scanned beam is then passed through a wave-length selective combinerbeamsplitter that separatesthe visible and IR components of the beams The visibleportion of the beam passes through the combiner and isreected by a spherical converging mirror to form anexit pupil through which the image is transferred

Figure 1 Simplied operation of the shared-aperture displaytracking system

4 PRESENCE VOLUME 12 NUMBER 1

through the entrance pupil of the eye to the retina TheIR portion of the beam is reected in the opposite di-rection from the beamsplitter resulting in a raster of IRlight being scanned within the same eld of view as thedisplay image

Because the VRD is mounted on headgear the out-going tracking scanning pattern is aligned with the dis-play scanning pattern Given the known raster scanningpattern of both the IR and visible beams we can deter-mine the instantaneous orientation and position of thedisplay scene using a set of infrared detectors installed inthe environment (for example the cockpit in gure 1)These detectors measure the instant when the scanningIR beam is coincident with each detector The proposedtracking system is based on measuring the timing (timeduration between measured events) within the rasterscan of the display

A timing measurement from the detector determinesthe position of the display raster relative to the positionof the detector in two dimensions If several detectorsare used the absolute position and orientation (for ex-ample six degrees of freedom) of the head relative tothe outside world can be calculated We describe in sub-section 22 some of the detection techniques we havedeveloped

22 Detection Techniques

The easiest way to determine the vertical orienta-tion of the display relative to the detectors is to countthe number of raster lines from the top of the displayframe to the instant a line passes a detector Given theinstantaneous vertical eld of view of the display theline count accurately determines how far down theframe the display has scanned when the detector isreached Because several scan lines may reach the detec-tor an analysis of successive scan lines allows the proces-sor to pick the middle line or interpolate between linesHowever the determination of the horizontal orienta-tion is a far more difcult issue The following para-graphs explain how an accurate determination of thehorizontal orientation is obtained

To determine horizontal orientation of the displayimage the detected IR pulse can be timed from the

start of each display frame as in the vertical case How-ever the exact horizontal scanning frequency in theVRDrsquos mechanical resonant scanner (MRS) is difcultto measure to a high degree of accuracy Therefore theresulting error in determining a horizontal locationwithin the overall raster scan (pixel error) dramaticallyincreases if the detected signal is referred to only thebeginning of the display frame This error accumulateswith each successive scan line Even though the methodof counting from the beginning of each frame is accu-rate enough for determining the vertical location as be-fore the horizontal location within a scan line needs tobe determined separately from the vertical location

A more straightforward method for determining thehorizontal location within a scan line is to refer to theMRS driving signal at the actual beginning of every scanline By counting the time from the beginning of thatline to the detection event the horizontal position ofthe display scene relative to the detector is determinedBut this assumes that there is stability between the ac-tual locations of the mirror location relative to the MRSdrive signal Unfortunately this may not be the caseWe have observed that the natural or resonant fre-quency of the MRS varies with environmental changessuch as temperature As a result the phase of the drivingsignal no longer matches the actual phase of the scan-ner

To negate the effect of such a phase drift the actualbeginning of the scan line is needed Our initial ap-proach to measure the actual beginning of the scan linewas to put an optical sensor at the edge of the scannedeld to detect the beginning of every scan line How-ever the performance was susceptible to slight mechani-cal misalignment between the sensor strip and the actualscanned eld

To overcome this problem we developed a dual de-tector approach (Refer to gure 2 and 3) Because mul-tiple IR scan lines are likely to be detected in eachscanned frame the time between each IR pulse deter-mines the general location within the scan line How-ever with one detector the direction of the horizontalscan when the beam rst hits the detector in any givenframe (such as going from left to right or vice versa) isarbitrary Therefore a second detector is placed to the

Chinthammit et al 5

side of the rst detector along the horizontal scan axisas illustrated in gure 2 creating multiple IR pulsesfrom each detector as shown in gure 3 Because thescanned beam strikes the two detectors sequentially thescan direction can be determined by the sequence ofdetection events

To obtain a precise horizontal location within thescan line one of the two detectors is chosen as the ref-erence As shown in gure 3 time detector A is the ref-erence detector and the timing regiment is when the IRbeam strikes detector A to when the returning beamalso strikes detector A We assume that this time dura-tion or measured pulse width is twice as long as the timefrom detector A to the scan edge or the actual begin-ning of the scan line The corresponding horizontal lo-cation is then calculated by halving this timing durationIn this way the beginning of the scan line is inferredand the location of the detector along the horizontalline relative to that beginning is determined Becauseseveral scan lines are likely to pass by the detector wecollect all of the events and pick the one in the middle(that is the middle of an odd set of scan lines or inter-polate if there are an even number) to determine thehorizontal location of the raster relative to the detector

To reiterate the timing from the beginning of theframe to the time when a beam hits detector A is usedto calculate the corresponding vertical location of theraster relative to the detector Timing from the intervalof successive passes of scan lines across the detectors isused to determine the horizontal location of the rasterrelative to the detector By knowing the location of thedetectors in the cockpit the relative orientation of thedisplay raster in azimuth and elevation angles can becalculated By locating three or more detectors in the

real-world environment that are within the instanta-neous external scan eld the position and orientation ofthe display raster in six degrees of freedom can be deter-mined This aspect is discussed in subsection 23

23 Tracking Principle

As stated a detectorrsquos 2D horizontal and verticallocation within the raster scan is the basis upon whichwe make all of our calculations of position and orienta-tion of the display raster in space We start by trackingin two dimensions in the time domain and then by cal-culation from multiple detectors relate that to trackingin the 3D spatial domain

231 Tracking in the Time Domain Track-ing in the time domain is dened in two dimensions as(pixel line) coordinates for horizontal and vertical loca-tions in the tracking raster respectively We use astraightforward way to calculate the corresponding pixeland line positions from the timing measurement ob-tained in the detection system For the pixel calculationthe horizontal timing is divided by a pixel clock (40nspixel) to obtain the corresponding pixel locationFor the line calculation the vertical timing is used todirectly calculate the corresponding line location by di-viding by the scan line time duration It is worth notingthat the error from not having an absolute scan linetime duration of the MRS is unlikely to have much ef-fect on the line accuracy This is because the truncatedresult does not likely reect the error Consequentiallywe can obtain a (pixel line) coordinate of the target(detector) in two dimensions

Even though 2D tracking in the time domain can beobtained via a single detector we are only capable ofsuperimposing an image or overlay in the direction ofthat single detector For example if the VRD is to su-perimpose an image slightly to the side of the detectorthe registration of the augmented image is changed inlocation slightly as the detector moves It is due to non-uniformity in pixel location and spacing within the dis-play FOV Consequently target-tracking applicationsare limited To fully use the SARSDT as an AR systemit is necessary to know where each pixel in the display

Figure 2 The scanned IR eld

6 PRESENCE VOLUME 12 NUMBER 1

actually overlays a location in the real worldmdashthe spatialinformation

232 Tracking in the Spatial Domain Tofully determine the intercept of each raster pixel (be-yond the single pixel detected previously) it is necessaryto determine the position of that raster pixel relative tothe real world By locating several detectors whose loca-tions in the real-world environment are known the po-sition and orientation of the display raster in six degreesof freedom can be determined

A fundamental problem of converting timing infor-mation into spatial information in the world coordinateis an orientation offset in three-dimensional space be-tween the IR scanned eld axes and the world axes(Chinthammit Seibel amp Furness 2002) The solutionto this problem is to determine a six-DOF transforma-tion between the world coordinates and the VRD coor-dinates

Generally this problem can be solved by triangulationtechniques in which the detectorsrsquo positions are known

relative to the real world Typically these techniquesrequire that distances from emittersource to each de-tector are known For example in the event of an earth-quake a seismograph can detect an earthquake magni-tude but not a precise direction and depth where theepicenter is located At least three seismographs areneeded to triangulate a precise epicenter see an exampleby Azuma and Ward (1991) Because our detection sys-tem is based on the timing incident in 2D raster scansingle-pixel detection cannot adequately determine thedistance Theoretically an infrared sensor unit can pos-sibly determine the distance from the signal strength ofthe received signal However the nature of our scan-ning eld causes an infrared beam to travel quickly overa sensing area of an infrared detector This makes it im-practical to accurately determine the distance from asignal strength level

This single-pixel detection problem in 2D coordinatescan commonly be addressed as a computer vision prob-lem such as a 3D reconstruction Therefore this trian-gulation problem can also be solved by computer vision

Figure 3 The pulse train from the dual-detector system

Chinthammit et al 7

techniques as well In our application a distance from aldquocamerardquo to each ldquomarkerrdquo is unknown whereas dis-tances of each marker (detector) relative to the referenceare known Our known markers are located in a 2D co-ordinate (pixel and line) through a captured videoframe The reconstruction performs on a principle that amarkerrsquos 3D position in the real world and the corre-sponding 2D coordinate on the camera screen must liealong the same ray formed by the camera projection Ifthe accurate 3D position estimations were found thedistance between markers would be equal to the physi-cal distances Generally the 3D position estimation issolved by an optimization process which can be timeconsuming to reach the optimal solution (Janin ZikanMizell Banner amp Sowizral 1994)

Another computer vision technique that requires lessoptimization process is to congure four markers intoeach vertex of the square Using the fact that there aretwo sets of parallel sides and their direction vectors per-pendicular to each other an image analysis algorithm isused to estimate the position and orientation of thecamera with respect to the square plane The optimiza-tion process is applied only to compensate for the errorin detected vertices coordinates that cause the directionvectors of the two parallel sides of the square not per-pendicular to each other Details on this developmentand algorithm are explained by Kato and Billinghurst(1999) Due to its simplicity we implement this imageanalysis technique on our hardware system to determinea six-DOF of detectorsrsquo plane with respect to the VRD

In gure 4 looking out from the VRD we dene areference 2D plane at a xed distance Zref which is animaginary distance away from the origin of the VRDcoordinate (The choice of Zref does not affect the over-all tracking accuracy) A projection of four vertices onour reference 2D plane represents vertices on the cam-era screen as implemented in the original algorithm(Kato amp Billinghurst 1999) Because a 2D coordinateis dened in the VRD coordinate converting timinginformation into spatial information becomes a muchsimpler issue The horizontal and vertical coordinate isdirectly obtained from (pixel line) coordinates in thetime domain Our 2D coordinates (or displacements) of

four vertices on that plane can be calculated by the fol-lowing equations

Displacementhorizontal ~t 5 2Kx

2 cos~2p freqx tx

(1)

Displacementvertical ~t 5 Ky z ty z freqY 2Ky

2

We dene subscripts (xy) as the directions along theMRSrsquos moving axis and the galvanometerrsquos moving axisrespectively Therefore (xy) can also be referred to asthe directions in the horizontal and vertical axes of theVRD scan The t is the timing measurement in bothaxes or (pixel line) coordinates from detections in timedomain Displacement is a calculated position in respec-tive axes although their calculations are derived differ-ently for horizontal and vertical axes The calculation ofthe horizontal displacement is modeled to correct forthe distortion of the sinusoidal nature of the MRSwhereas a linear model is used in the calculation of thevertical displacement due to a linearity of the galvanom-eter motion The displacement at the center of the scanis set to (00) for convenience purposes The K is thedisplacement over the entire FOV of the scanner in re-spective axes at the distance Zref The freq is the scan-

Figure 4 The illustration of how we dene a 2D plane in the VRD

coordinate

8 PRESENCE VOLUME 12 NUMBER 1

ning frequency of both scanners In conclusion equa-tion (1) transforms our 2D coordinates in time domainto 2D positions in the spatial domain

It is worth noting that this 2D position lies along thesame ray (formed by the scanner) with the actual 3Dposition in the real world Once the 2D position of eachvertex has been determined the image analysis algo-rithm can then determine the translation and orienta-tion of the square plane with respect to the scanner

3 Experimental Setting

This section rst describes all of the main appara-tuses in this experiment Then we describe our currentsystem conguration It is slightly different from whathave been described in subsection 21 in terms of amoving element Lastly performance metrics are de-ned as a way to characterize our proposed system

31 Experimental Apparatus

To characterize the utility and performance of theSARSDT concept we built a breadboard optical benchas shown in gure 7 Four subsystems compose the ex-perimental SARSDT a modied virtual retinal display arobot arm target mover a detection and data acquisi-tion subsystem and a reticle generator These sub-systems are described in more detail in the followingsubsections

311 The Virtual Retinal Display As stated inthe introduction the VRD scans laser light directly tothe retina of the eye to create augmented images It usestwo single-axis scanners to deect a beam of modulatedlaser light Currently the VRD scans at VGA format or640 3 480 pixel resolution The vertical scan rate isequivalent to the progressive frame rate Because the 60Hz vertical scan rate is relatively low an existing galva-nometer scanner is used as the vertical scanner (Cam-bridge Instruments) Galvanometers are capable of scan-ning through wide angles but at frequencies that aremuch lower than the required horizontal frequenciesEarly in our development of the VRD we built a me-

chanical resonant scanner (MRS) (Johnston amp Willey1995) to meet the horizontal scanner requirementsThe only moving part is a torsional springmirror com-bination used in a ux circuit Eliminating moving coilsor magnets (contained in existing mechanical resonantscanners) greatly lowers the rotational inertia of the de-vice thus a high operational frequency can be achievedAnother unique feature of the MRS is its size (the mir-ror is 2 3 6 mm) The entire scanner can be very smalland is being made smaller as a micro-electromechanicalsystem (MEMS) (Wine Helsel Jenkins Urey amp Os-born 2000)

To extend the eld of view of the horizontal scannerwe arrange the scanning mirrors to cause the beam toreect twice between the MRS and galvanometer mir-rors as shown in gure 5 This approach effectively mul-tiplies the beam deection by 4 relative to the actualrotational angle of the horizontal scanner

As shown in gure 6 the scanned beams passthrough a beamsplitter (extended hot mirror EdmundScientic) and spherical optical mirror to converge andcollimate the visible light at the exit pupil When theviewer aligns her eye entrance pupil with the exit pupilof the VRD she perceives a high-luminance display atoptical innity that is overlaid onto the natural view ofthe surround

312 Light Sources For convenience in theSARSDT breadboard system we use only one wave-length of visible light (red) and one IR wavelength Thevisible light source is a directly modulated red laser di-

Figure 5 Two scanning mirrors

Chinthammit et al 9

ode (635 nm Melles Griot Inc) that is ber opticallycoupled The IR laser diode (1310 nm Newport Corp)is not modulated Both visible and IR light are com-bined using an IR beroptic combiner (Newport Corp)into a single multispectral beam The result is a multiplespectra beam The collimation of the mixed wavelengthbeam is individually optimized for the IR and visible wave-length before being scanned by the two-axis scanner

313 Wavelength-Selective BeamsplitterFor the SARSDT the normal beamsplitter in the VRDconguration is replaced by a wavelength-selectivebeamsplitter or hot mirror This allows the red visiblewavelength to pass through the beamsplitter with lowattenuation while being highly reective to the infraredwavelength (1310 nm) As a result the wavelength-selective beamsplitter creates two simultaneous scans avisible one into the eye and an IR one into the environ-ment

314 Robot-Arm Target Mover A ve-DOFrobot arm (Eshed Robotec Scorbot-ER 4PC) is used totest the tracking performance of the system Because therst conguration of this concept is installed on an opti-cal bench it is difcult for us to move the display (aswould be the case with the eventual head-coupled ver-sion of the system) relative to xed detectors in the en-vironment Instead we decided to move the detectorsusing the robot arm In case of the target-tracking con-guration (see subsection 32) we use the robot arm tomove a detection circuit at the predetermined trajec-tory equivalent to moving the head in the opposite tra-jectory The robot arm provides a repeatable movementand reports the ldquotruthrdquo position The target mover sys-tem is independently operated from the tracking systemby a separate computer Robot arm positions in veDOF (no roll axis) are recorded at 8 Hz using customsoftware and an interface The reported positions canthen be used for evaluating tracking performances

Figure 6 Optical relay components

10 PRESENCE VOLUME 12 NUMBER 1

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 4: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

tions (Kutulakos amp Vallino 1998 Kato amp Billinghurst1999) Furthermore when a single point within theFOV is tracked across the 2D projection of the 3Dscene simplied image-plane interaction techniques forAR become feasible (Pierce et al 1997) This shared-aperture display and tracking system can therefore allevi-ate shortcomings of current head-tracking systems asdiscussed previously

2 System Concept

We rst describe a fundamental concept of ourproposed shared-aperture retinal scanning display andtracking system It illustrates how the VRD optical pathcan be used simultaneously as an optical tracking sys-tem Then several detection techniques are describedIt explains what techniques we have implemented todetect an optical scanning beam Lastly we explain thetracking principles of our SARSDT system

21 Shared Aperture Retinal ScanningDisplay and Tracking System

Simply stated the SARSDT is a VRD that hasbeen modied to provide a tracking function by sharing

coaxially the optical path of the VRD (Figure 1 shows asimplied diagram of the system) Visible and IR lasersgenerate a multispectral beam of infrared and visiblelight The visible beam is modulated by the video signal(as in the normal VRD) and the IR beam is unmodu-lated The IR and visual beams are combined beforebeing manipulated by two mirrors which scan thebeams in a raster pattern A high-speed resonant scannerscans the beams back and forth in the horizontal axiswhile a galvanometer mirror scans the beams in the ver-tical axis (Note that in the case of the VRD the displayis generated by active raster lines that are scanned inboth directions left to right and right to left as opposedto normal NTSC raster lines which are only scanned ina left-to-right direction) To draw the raster the hori-zontal scanner moves the beam to draw a row of pixelsat approximately 1575 kHz then the vertical scannermoves the beam at VGA standards to the next linewhere another row of pixels is drawn with all lines re-freshed at 60 Hz

The scanned beam is then passed through a wave-length selective combinerbeamsplitter that separatesthe visible and IR components of the beams The visibleportion of the beam passes through the combiner and isreected by a spherical converging mirror to form anexit pupil through which the image is transferred

Figure 1 Simplied operation of the shared-aperture displaytracking system

4 PRESENCE VOLUME 12 NUMBER 1

through the entrance pupil of the eye to the retina TheIR portion of the beam is reected in the opposite di-rection from the beamsplitter resulting in a raster of IRlight being scanned within the same eld of view as thedisplay image

Because the VRD is mounted on headgear the out-going tracking scanning pattern is aligned with the dis-play scanning pattern Given the known raster scanningpattern of both the IR and visible beams we can deter-mine the instantaneous orientation and position of thedisplay scene using a set of infrared detectors installed inthe environment (for example the cockpit in gure 1)These detectors measure the instant when the scanningIR beam is coincident with each detector The proposedtracking system is based on measuring the timing (timeduration between measured events) within the rasterscan of the display

A timing measurement from the detector determinesthe position of the display raster relative to the positionof the detector in two dimensions If several detectorsare used the absolute position and orientation (for ex-ample six degrees of freedom) of the head relative tothe outside world can be calculated We describe in sub-section 22 some of the detection techniques we havedeveloped

22 Detection Techniques

The easiest way to determine the vertical orienta-tion of the display relative to the detectors is to countthe number of raster lines from the top of the displayframe to the instant a line passes a detector Given theinstantaneous vertical eld of view of the display theline count accurately determines how far down theframe the display has scanned when the detector isreached Because several scan lines may reach the detec-tor an analysis of successive scan lines allows the proces-sor to pick the middle line or interpolate between linesHowever the determination of the horizontal orienta-tion is a far more difcult issue The following para-graphs explain how an accurate determination of thehorizontal orientation is obtained

To determine horizontal orientation of the displayimage the detected IR pulse can be timed from the

start of each display frame as in the vertical case How-ever the exact horizontal scanning frequency in theVRDrsquos mechanical resonant scanner (MRS) is difcultto measure to a high degree of accuracy Therefore theresulting error in determining a horizontal locationwithin the overall raster scan (pixel error) dramaticallyincreases if the detected signal is referred to only thebeginning of the display frame This error accumulateswith each successive scan line Even though the methodof counting from the beginning of each frame is accu-rate enough for determining the vertical location as be-fore the horizontal location within a scan line needs tobe determined separately from the vertical location

A more straightforward method for determining thehorizontal location within a scan line is to refer to theMRS driving signal at the actual beginning of every scanline By counting the time from the beginning of thatline to the detection event the horizontal position ofthe display scene relative to the detector is determinedBut this assumes that there is stability between the ac-tual locations of the mirror location relative to the MRSdrive signal Unfortunately this may not be the caseWe have observed that the natural or resonant fre-quency of the MRS varies with environmental changessuch as temperature As a result the phase of the drivingsignal no longer matches the actual phase of the scan-ner

To negate the effect of such a phase drift the actualbeginning of the scan line is needed Our initial ap-proach to measure the actual beginning of the scan linewas to put an optical sensor at the edge of the scannedeld to detect the beginning of every scan line How-ever the performance was susceptible to slight mechani-cal misalignment between the sensor strip and the actualscanned eld

To overcome this problem we developed a dual de-tector approach (Refer to gure 2 and 3) Because mul-tiple IR scan lines are likely to be detected in eachscanned frame the time between each IR pulse deter-mines the general location within the scan line How-ever with one detector the direction of the horizontalscan when the beam rst hits the detector in any givenframe (such as going from left to right or vice versa) isarbitrary Therefore a second detector is placed to the

Chinthammit et al 5

side of the rst detector along the horizontal scan axisas illustrated in gure 2 creating multiple IR pulsesfrom each detector as shown in gure 3 Because thescanned beam strikes the two detectors sequentially thescan direction can be determined by the sequence ofdetection events

To obtain a precise horizontal location within thescan line one of the two detectors is chosen as the ref-erence As shown in gure 3 time detector A is the ref-erence detector and the timing regiment is when the IRbeam strikes detector A to when the returning beamalso strikes detector A We assume that this time dura-tion or measured pulse width is twice as long as the timefrom detector A to the scan edge or the actual begin-ning of the scan line The corresponding horizontal lo-cation is then calculated by halving this timing durationIn this way the beginning of the scan line is inferredand the location of the detector along the horizontalline relative to that beginning is determined Becauseseveral scan lines are likely to pass by the detector wecollect all of the events and pick the one in the middle(that is the middle of an odd set of scan lines or inter-polate if there are an even number) to determine thehorizontal location of the raster relative to the detector

To reiterate the timing from the beginning of theframe to the time when a beam hits detector A is usedto calculate the corresponding vertical location of theraster relative to the detector Timing from the intervalof successive passes of scan lines across the detectors isused to determine the horizontal location of the rasterrelative to the detector By knowing the location of thedetectors in the cockpit the relative orientation of thedisplay raster in azimuth and elevation angles can becalculated By locating three or more detectors in the

real-world environment that are within the instanta-neous external scan eld the position and orientation ofthe display raster in six degrees of freedom can be deter-mined This aspect is discussed in subsection 23

23 Tracking Principle

As stated a detectorrsquos 2D horizontal and verticallocation within the raster scan is the basis upon whichwe make all of our calculations of position and orienta-tion of the display raster in space We start by trackingin two dimensions in the time domain and then by cal-culation from multiple detectors relate that to trackingin the 3D spatial domain

231 Tracking in the Time Domain Track-ing in the time domain is dened in two dimensions as(pixel line) coordinates for horizontal and vertical loca-tions in the tracking raster respectively We use astraightforward way to calculate the corresponding pixeland line positions from the timing measurement ob-tained in the detection system For the pixel calculationthe horizontal timing is divided by a pixel clock (40nspixel) to obtain the corresponding pixel locationFor the line calculation the vertical timing is used todirectly calculate the corresponding line location by di-viding by the scan line time duration It is worth notingthat the error from not having an absolute scan linetime duration of the MRS is unlikely to have much ef-fect on the line accuracy This is because the truncatedresult does not likely reect the error Consequentiallywe can obtain a (pixel line) coordinate of the target(detector) in two dimensions

Even though 2D tracking in the time domain can beobtained via a single detector we are only capable ofsuperimposing an image or overlay in the direction ofthat single detector For example if the VRD is to su-perimpose an image slightly to the side of the detectorthe registration of the augmented image is changed inlocation slightly as the detector moves It is due to non-uniformity in pixel location and spacing within the dis-play FOV Consequently target-tracking applicationsare limited To fully use the SARSDT as an AR systemit is necessary to know where each pixel in the display

Figure 2 The scanned IR eld

6 PRESENCE VOLUME 12 NUMBER 1

actually overlays a location in the real worldmdashthe spatialinformation

232 Tracking in the Spatial Domain Tofully determine the intercept of each raster pixel (be-yond the single pixel detected previously) it is necessaryto determine the position of that raster pixel relative tothe real world By locating several detectors whose loca-tions in the real-world environment are known the po-sition and orientation of the display raster in six degreesof freedom can be determined

A fundamental problem of converting timing infor-mation into spatial information in the world coordinateis an orientation offset in three-dimensional space be-tween the IR scanned eld axes and the world axes(Chinthammit Seibel amp Furness 2002) The solutionto this problem is to determine a six-DOF transforma-tion between the world coordinates and the VRD coor-dinates

Generally this problem can be solved by triangulationtechniques in which the detectorsrsquo positions are known

relative to the real world Typically these techniquesrequire that distances from emittersource to each de-tector are known For example in the event of an earth-quake a seismograph can detect an earthquake magni-tude but not a precise direction and depth where theepicenter is located At least three seismographs areneeded to triangulate a precise epicenter see an exampleby Azuma and Ward (1991) Because our detection sys-tem is based on the timing incident in 2D raster scansingle-pixel detection cannot adequately determine thedistance Theoretically an infrared sensor unit can pos-sibly determine the distance from the signal strength ofthe received signal However the nature of our scan-ning eld causes an infrared beam to travel quickly overa sensing area of an infrared detector This makes it im-practical to accurately determine the distance from asignal strength level

This single-pixel detection problem in 2D coordinatescan commonly be addressed as a computer vision prob-lem such as a 3D reconstruction Therefore this trian-gulation problem can also be solved by computer vision

Figure 3 The pulse train from the dual-detector system

Chinthammit et al 7

techniques as well In our application a distance from aldquocamerardquo to each ldquomarkerrdquo is unknown whereas dis-tances of each marker (detector) relative to the referenceare known Our known markers are located in a 2D co-ordinate (pixel and line) through a captured videoframe The reconstruction performs on a principle that amarkerrsquos 3D position in the real world and the corre-sponding 2D coordinate on the camera screen must liealong the same ray formed by the camera projection Ifthe accurate 3D position estimations were found thedistance between markers would be equal to the physi-cal distances Generally the 3D position estimation issolved by an optimization process which can be timeconsuming to reach the optimal solution (Janin ZikanMizell Banner amp Sowizral 1994)

Another computer vision technique that requires lessoptimization process is to congure four markers intoeach vertex of the square Using the fact that there aretwo sets of parallel sides and their direction vectors per-pendicular to each other an image analysis algorithm isused to estimate the position and orientation of thecamera with respect to the square plane The optimiza-tion process is applied only to compensate for the errorin detected vertices coordinates that cause the directionvectors of the two parallel sides of the square not per-pendicular to each other Details on this developmentand algorithm are explained by Kato and Billinghurst(1999) Due to its simplicity we implement this imageanalysis technique on our hardware system to determinea six-DOF of detectorsrsquo plane with respect to the VRD

In gure 4 looking out from the VRD we dene areference 2D plane at a xed distance Zref which is animaginary distance away from the origin of the VRDcoordinate (The choice of Zref does not affect the over-all tracking accuracy) A projection of four vertices onour reference 2D plane represents vertices on the cam-era screen as implemented in the original algorithm(Kato amp Billinghurst 1999) Because a 2D coordinateis dened in the VRD coordinate converting timinginformation into spatial information becomes a muchsimpler issue The horizontal and vertical coordinate isdirectly obtained from (pixel line) coordinates in thetime domain Our 2D coordinates (or displacements) of

four vertices on that plane can be calculated by the fol-lowing equations

Displacementhorizontal ~t 5 2Kx

2 cos~2p freqx tx

(1)

Displacementvertical ~t 5 Ky z ty z freqY 2Ky

2

We dene subscripts (xy) as the directions along theMRSrsquos moving axis and the galvanometerrsquos moving axisrespectively Therefore (xy) can also be referred to asthe directions in the horizontal and vertical axes of theVRD scan The t is the timing measurement in bothaxes or (pixel line) coordinates from detections in timedomain Displacement is a calculated position in respec-tive axes although their calculations are derived differ-ently for horizontal and vertical axes The calculation ofthe horizontal displacement is modeled to correct forthe distortion of the sinusoidal nature of the MRSwhereas a linear model is used in the calculation of thevertical displacement due to a linearity of the galvanom-eter motion The displacement at the center of the scanis set to (00) for convenience purposes The K is thedisplacement over the entire FOV of the scanner in re-spective axes at the distance Zref The freq is the scan-

Figure 4 The illustration of how we dene a 2D plane in the VRD

coordinate

8 PRESENCE VOLUME 12 NUMBER 1

ning frequency of both scanners In conclusion equa-tion (1) transforms our 2D coordinates in time domainto 2D positions in the spatial domain

It is worth noting that this 2D position lies along thesame ray (formed by the scanner) with the actual 3Dposition in the real world Once the 2D position of eachvertex has been determined the image analysis algo-rithm can then determine the translation and orienta-tion of the square plane with respect to the scanner

3 Experimental Setting

This section rst describes all of the main appara-tuses in this experiment Then we describe our currentsystem conguration It is slightly different from whathave been described in subsection 21 in terms of amoving element Lastly performance metrics are de-ned as a way to characterize our proposed system

31 Experimental Apparatus

To characterize the utility and performance of theSARSDT concept we built a breadboard optical benchas shown in gure 7 Four subsystems compose the ex-perimental SARSDT a modied virtual retinal display arobot arm target mover a detection and data acquisi-tion subsystem and a reticle generator These sub-systems are described in more detail in the followingsubsections

311 The Virtual Retinal Display As stated inthe introduction the VRD scans laser light directly tothe retina of the eye to create augmented images It usestwo single-axis scanners to deect a beam of modulatedlaser light Currently the VRD scans at VGA format or640 3 480 pixel resolution The vertical scan rate isequivalent to the progressive frame rate Because the 60Hz vertical scan rate is relatively low an existing galva-nometer scanner is used as the vertical scanner (Cam-bridge Instruments) Galvanometers are capable of scan-ning through wide angles but at frequencies that aremuch lower than the required horizontal frequenciesEarly in our development of the VRD we built a me-

chanical resonant scanner (MRS) (Johnston amp Willey1995) to meet the horizontal scanner requirementsThe only moving part is a torsional springmirror com-bination used in a ux circuit Eliminating moving coilsor magnets (contained in existing mechanical resonantscanners) greatly lowers the rotational inertia of the de-vice thus a high operational frequency can be achievedAnother unique feature of the MRS is its size (the mir-ror is 2 3 6 mm) The entire scanner can be very smalland is being made smaller as a micro-electromechanicalsystem (MEMS) (Wine Helsel Jenkins Urey amp Os-born 2000)

To extend the eld of view of the horizontal scannerwe arrange the scanning mirrors to cause the beam toreect twice between the MRS and galvanometer mir-rors as shown in gure 5 This approach effectively mul-tiplies the beam deection by 4 relative to the actualrotational angle of the horizontal scanner

As shown in gure 6 the scanned beams passthrough a beamsplitter (extended hot mirror EdmundScientic) and spherical optical mirror to converge andcollimate the visible light at the exit pupil When theviewer aligns her eye entrance pupil with the exit pupilof the VRD she perceives a high-luminance display atoptical innity that is overlaid onto the natural view ofthe surround

312 Light Sources For convenience in theSARSDT breadboard system we use only one wave-length of visible light (red) and one IR wavelength Thevisible light source is a directly modulated red laser di-

Figure 5 Two scanning mirrors

Chinthammit et al 9

ode (635 nm Melles Griot Inc) that is ber opticallycoupled The IR laser diode (1310 nm Newport Corp)is not modulated Both visible and IR light are com-bined using an IR beroptic combiner (Newport Corp)into a single multispectral beam The result is a multiplespectra beam The collimation of the mixed wavelengthbeam is individually optimized for the IR and visible wave-length before being scanned by the two-axis scanner

313 Wavelength-Selective BeamsplitterFor the SARSDT the normal beamsplitter in the VRDconguration is replaced by a wavelength-selectivebeamsplitter or hot mirror This allows the red visiblewavelength to pass through the beamsplitter with lowattenuation while being highly reective to the infraredwavelength (1310 nm) As a result the wavelength-selective beamsplitter creates two simultaneous scans avisible one into the eye and an IR one into the environ-ment

314 Robot-Arm Target Mover A ve-DOFrobot arm (Eshed Robotec Scorbot-ER 4PC) is used totest the tracking performance of the system Because therst conguration of this concept is installed on an opti-cal bench it is difcult for us to move the display (aswould be the case with the eventual head-coupled ver-sion of the system) relative to xed detectors in the en-vironment Instead we decided to move the detectorsusing the robot arm In case of the target-tracking con-guration (see subsection 32) we use the robot arm tomove a detection circuit at the predetermined trajec-tory equivalent to moving the head in the opposite tra-jectory The robot arm provides a repeatable movementand reports the ldquotruthrdquo position The target mover sys-tem is independently operated from the tracking systemby a separate computer Robot arm positions in veDOF (no roll axis) are recorded at 8 Hz using customsoftware and an interface The reported positions canthen be used for evaluating tracking performances

Figure 6 Optical relay components

10 PRESENCE VOLUME 12 NUMBER 1

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 5: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

through the entrance pupil of the eye to the retina TheIR portion of the beam is reected in the opposite di-rection from the beamsplitter resulting in a raster of IRlight being scanned within the same eld of view as thedisplay image

Because the VRD is mounted on headgear the out-going tracking scanning pattern is aligned with the dis-play scanning pattern Given the known raster scanningpattern of both the IR and visible beams we can deter-mine the instantaneous orientation and position of thedisplay scene using a set of infrared detectors installed inthe environment (for example the cockpit in gure 1)These detectors measure the instant when the scanningIR beam is coincident with each detector The proposedtracking system is based on measuring the timing (timeduration between measured events) within the rasterscan of the display

A timing measurement from the detector determinesthe position of the display raster relative to the positionof the detector in two dimensions If several detectorsare used the absolute position and orientation (for ex-ample six degrees of freedom) of the head relative tothe outside world can be calculated We describe in sub-section 22 some of the detection techniques we havedeveloped

22 Detection Techniques

The easiest way to determine the vertical orienta-tion of the display relative to the detectors is to countthe number of raster lines from the top of the displayframe to the instant a line passes a detector Given theinstantaneous vertical eld of view of the display theline count accurately determines how far down theframe the display has scanned when the detector isreached Because several scan lines may reach the detec-tor an analysis of successive scan lines allows the proces-sor to pick the middle line or interpolate between linesHowever the determination of the horizontal orienta-tion is a far more difcult issue The following para-graphs explain how an accurate determination of thehorizontal orientation is obtained

To determine horizontal orientation of the displayimage the detected IR pulse can be timed from the

start of each display frame as in the vertical case How-ever the exact horizontal scanning frequency in theVRDrsquos mechanical resonant scanner (MRS) is difcultto measure to a high degree of accuracy Therefore theresulting error in determining a horizontal locationwithin the overall raster scan (pixel error) dramaticallyincreases if the detected signal is referred to only thebeginning of the display frame This error accumulateswith each successive scan line Even though the methodof counting from the beginning of each frame is accu-rate enough for determining the vertical location as be-fore the horizontal location within a scan line needs tobe determined separately from the vertical location

A more straightforward method for determining thehorizontal location within a scan line is to refer to theMRS driving signal at the actual beginning of every scanline By counting the time from the beginning of thatline to the detection event the horizontal position ofthe display scene relative to the detector is determinedBut this assumes that there is stability between the ac-tual locations of the mirror location relative to the MRSdrive signal Unfortunately this may not be the caseWe have observed that the natural or resonant fre-quency of the MRS varies with environmental changessuch as temperature As a result the phase of the drivingsignal no longer matches the actual phase of the scan-ner

To negate the effect of such a phase drift the actualbeginning of the scan line is needed Our initial ap-proach to measure the actual beginning of the scan linewas to put an optical sensor at the edge of the scannedeld to detect the beginning of every scan line How-ever the performance was susceptible to slight mechani-cal misalignment between the sensor strip and the actualscanned eld

To overcome this problem we developed a dual de-tector approach (Refer to gure 2 and 3) Because mul-tiple IR scan lines are likely to be detected in eachscanned frame the time between each IR pulse deter-mines the general location within the scan line How-ever with one detector the direction of the horizontalscan when the beam rst hits the detector in any givenframe (such as going from left to right or vice versa) isarbitrary Therefore a second detector is placed to the

Chinthammit et al 5

side of the rst detector along the horizontal scan axisas illustrated in gure 2 creating multiple IR pulsesfrom each detector as shown in gure 3 Because thescanned beam strikes the two detectors sequentially thescan direction can be determined by the sequence ofdetection events

To obtain a precise horizontal location within thescan line one of the two detectors is chosen as the ref-erence As shown in gure 3 time detector A is the ref-erence detector and the timing regiment is when the IRbeam strikes detector A to when the returning beamalso strikes detector A We assume that this time dura-tion or measured pulse width is twice as long as the timefrom detector A to the scan edge or the actual begin-ning of the scan line The corresponding horizontal lo-cation is then calculated by halving this timing durationIn this way the beginning of the scan line is inferredand the location of the detector along the horizontalline relative to that beginning is determined Becauseseveral scan lines are likely to pass by the detector wecollect all of the events and pick the one in the middle(that is the middle of an odd set of scan lines or inter-polate if there are an even number) to determine thehorizontal location of the raster relative to the detector

To reiterate the timing from the beginning of theframe to the time when a beam hits detector A is usedto calculate the corresponding vertical location of theraster relative to the detector Timing from the intervalof successive passes of scan lines across the detectors isused to determine the horizontal location of the rasterrelative to the detector By knowing the location of thedetectors in the cockpit the relative orientation of thedisplay raster in azimuth and elevation angles can becalculated By locating three or more detectors in the

real-world environment that are within the instanta-neous external scan eld the position and orientation ofthe display raster in six degrees of freedom can be deter-mined This aspect is discussed in subsection 23

23 Tracking Principle

As stated a detectorrsquos 2D horizontal and verticallocation within the raster scan is the basis upon whichwe make all of our calculations of position and orienta-tion of the display raster in space We start by trackingin two dimensions in the time domain and then by cal-culation from multiple detectors relate that to trackingin the 3D spatial domain

231 Tracking in the Time Domain Track-ing in the time domain is dened in two dimensions as(pixel line) coordinates for horizontal and vertical loca-tions in the tracking raster respectively We use astraightforward way to calculate the corresponding pixeland line positions from the timing measurement ob-tained in the detection system For the pixel calculationthe horizontal timing is divided by a pixel clock (40nspixel) to obtain the corresponding pixel locationFor the line calculation the vertical timing is used todirectly calculate the corresponding line location by di-viding by the scan line time duration It is worth notingthat the error from not having an absolute scan linetime duration of the MRS is unlikely to have much ef-fect on the line accuracy This is because the truncatedresult does not likely reect the error Consequentiallywe can obtain a (pixel line) coordinate of the target(detector) in two dimensions

Even though 2D tracking in the time domain can beobtained via a single detector we are only capable ofsuperimposing an image or overlay in the direction ofthat single detector For example if the VRD is to su-perimpose an image slightly to the side of the detectorthe registration of the augmented image is changed inlocation slightly as the detector moves It is due to non-uniformity in pixel location and spacing within the dis-play FOV Consequently target-tracking applicationsare limited To fully use the SARSDT as an AR systemit is necessary to know where each pixel in the display

Figure 2 The scanned IR eld

6 PRESENCE VOLUME 12 NUMBER 1

actually overlays a location in the real worldmdashthe spatialinformation

232 Tracking in the Spatial Domain Tofully determine the intercept of each raster pixel (be-yond the single pixel detected previously) it is necessaryto determine the position of that raster pixel relative tothe real world By locating several detectors whose loca-tions in the real-world environment are known the po-sition and orientation of the display raster in six degreesof freedom can be determined

A fundamental problem of converting timing infor-mation into spatial information in the world coordinateis an orientation offset in three-dimensional space be-tween the IR scanned eld axes and the world axes(Chinthammit Seibel amp Furness 2002) The solutionto this problem is to determine a six-DOF transforma-tion between the world coordinates and the VRD coor-dinates

Generally this problem can be solved by triangulationtechniques in which the detectorsrsquo positions are known

relative to the real world Typically these techniquesrequire that distances from emittersource to each de-tector are known For example in the event of an earth-quake a seismograph can detect an earthquake magni-tude but not a precise direction and depth where theepicenter is located At least three seismographs areneeded to triangulate a precise epicenter see an exampleby Azuma and Ward (1991) Because our detection sys-tem is based on the timing incident in 2D raster scansingle-pixel detection cannot adequately determine thedistance Theoretically an infrared sensor unit can pos-sibly determine the distance from the signal strength ofthe received signal However the nature of our scan-ning eld causes an infrared beam to travel quickly overa sensing area of an infrared detector This makes it im-practical to accurately determine the distance from asignal strength level

This single-pixel detection problem in 2D coordinatescan commonly be addressed as a computer vision prob-lem such as a 3D reconstruction Therefore this trian-gulation problem can also be solved by computer vision

Figure 3 The pulse train from the dual-detector system

Chinthammit et al 7

techniques as well In our application a distance from aldquocamerardquo to each ldquomarkerrdquo is unknown whereas dis-tances of each marker (detector) relative to the referenceare known Our known markers are located in a 2D co-ordinate (pixel and line) through a captured videoframe The reconstruction performs on a principle that amarkerrsquos 3D position in the real world and the corre-sponding 2D coordinate on the camera screen must liealong the same ray formed by the camera projection Ifthe accurate 3D position estimations were found thedistance between markers would be equal to the physi-cal distances Generally the 3D position estimation issolved by an optimization process which can be timeconsuming to reach the optimal solution (Janin ZikanMizell Banner amp Sowizral 1994)

Another computer vision technique that requires lessoptimization process is to congure four markers intoeach vertex of the square Using the fact that there aretwo sets of parallel sides and their direction vectors per-pendicular to each other an image analysis algorithm isused to estimate the position and orientation of thecamera with respect to the square plane The optimiza-tion process is applied only to compensate for the errorin detected vertices coordinates that cause the directionvectors of the two parallel sides of the square not per-pendicular to each other Details on this developmentand algorithm are explained by Kato and Billinghurst(1999) Due to its simplicity we implement this imageanalysis technique on our hardware system to determinea six-DOF of detectorsrsquo plane with respect to the VRD

In gure 4 looking out from the VRD we dene areference 2D plane at a xed distance Zref which is animaginary distance away from the origin of the VRDcoordinate (The choice of Zref does not affect the over-all tracking accuracy) A projection of four vertices onour reference 2D plane represents vertices on the cam-era screen as implemented in the original algorithm(Kato amp Billinghurst 1999) Because a 2D coordinateis dened in the VRD coordinate converting timinginformation into spatial information becomes a muchsimpler issue The horizontal and vertical coordinate isdirectly obtained from (pixel line) coordinates in thetime domain Our 2D coordinates (or displacements) of

four vertices on that plane can be calculated by the fol-lowing equations

Displacementhorizontal ~t 5 2Kx

2 cos~2p freqx tx

(1)

Displacementvertical ~t 5 Ky z ty z freqY 2Ky

2

We dene subscripts (xy) as the directions along theMRSrsquos moving axis and the galvanometerrsquos moving axisrespectively Therefore (xy) can also be referred to asthe directions in the horizontal and vertical axes of theVRD scan The t is the timing measurement in bothaxes or (pixel line) coordinates from detections in timedomain Displacement is a calculated position in respec-tive axes although their calculations are derived differ-ently for horizontal and vertical axes The calculation ofthe horizontal displacement is modeled to correct forthe distortion of the sinusoidal nature of the MRSwhereas a linear model is used in the calculation of thevertical displacement due to a linearity of the galvanom-eter motion The displacement at the center of the scanis set to (00) for convenience purposes The K is thedisplacement over the entire FOV of the scanner in re-spective axes at the distance Zref The freq is the scan-

Figure 4 The illustration of how we dene a 2D plane in the VRD

coordinate

8 PRESENCE VOLUME 12 NUMBER 1

ning frequency of both scanners In conclusion equa-tion (1) transforms our 2D coordinates in time domainto 2D positions in the spatial domain

It is worth noting that this 2D position lies along thesame ray (formed by the scanner) with the actual 3Dposition in the real world Once the 2D position of eachvertex has been determined the image analysis algo-rithm can then determine the translation and orienta-tion of the square plane with respect to the scanner

3 Experimental Setting

This section rst describes all of the main appara-tuses in this experiment Then we describe our currentsystem conguration It is slightly different from whathave been described in subsection 21 in terms of amoving element Lastly performance metrics are de-ned as a way to characterize our proposed system

31 Experimental Apparatus

To characterize the utility and performance of theSARSDT concept we built a breadboard optical benchas shown in gure 7 Four subsystems compose the ex-perimental SARSDT a modied virtual retinal display arobot arm target mover a detection and data acquisi-tion subsystem and a reticle generator These sub-systems are described in more detail in the followingsubsections

311 The Virtual Retinal Display As stated inthe introduction the VRD scans laser light directly tothe retina of the eye to create augmented images It usestwo single-axis scanners to deect a beam of modulatedlaser light Currently the VRD scans at VGA format or640 3 480 pixel resolution The vertical scan rate isequivalent to the progressive frame rate Because the 60Hz vertical scan rate is relatively low an existing galva-nometer scanner is used as the vertical scanner (Cam-bridge Instruments) Galvanometers are capable of scan-ning through wide angles but at frequencies that aremuch lower than the required horizontal frequenciesEarly in our development of the VRD we built a me-

chanical resonant scanner (MRS) (Johnston amp Willey1995) to meet the horizontal scanner requirementsThe only moving part is a torsional springmirror com-bination used in a ux circuit Eliminating moving coilsor magnets (contained in existing mechanical resonantscanners) greatly lowers the rotational inertia of the de-vice thus a high operational frequency can be achievedAnother unique feature of the MRS is its size (the mir-ror is 2 3 6 mm) The entire scanner can be very smalland is being made smaller as a micro-electromechanicalsystem (MEMS) (Wine Helsel Jenkins Urey amp Os-born 2000)

To extend the eld of view of the horizontal scannerwe arrange the scanning mirrors to cause the beam toreect twice between the MRS and galvanometer mir-rors as shown in gure 5 This approach effectively mul-tiplies the beam deection by 4 relative to the actualrotational angle of the horizontal scanner

As shown in gure 6 the scanned beams passthrough a beamsplitter (extended hot mirror EdmundScientic) and spherical optical mirror to converge andcollimate the visible light at the exit pupil When theviewer aligns her eye entrance pupil with the exit pupilof the VRD she perceives a high-luminance display atoptical innity that is overlaid onto the natural view ofthe surround

312 Light Sources For convenience in theSARSDT breadboard system we use only one wave-length of visible light (red) and one IR wavelength Thevisible light source is a directly modulated red laser di-

Figure 5 Two scanning mirrors

Chinthammit et al 9

ode (635 nm Melles Griot Inc) that is ber opticallycoupled The IR laser diode (1310 nm Newport Corp)is not modulated Both visible and IR light are com-bined using an IR beroptic combiner (Newport Corp)into a single multispectral beam The result is a multiplespectra beam The collimation of the mixed wavelengthbeam is individually optimized for the IR and visible wave-length before being scanned by the two-axis scanner

313 Wavelength-Selective BeamsplitterFor the SARSDT the normal beamsplitter in the VRDconguration is replaced by a wavelength-selectivebeamsplitter or hot mirror This allows the red visiblewavelength to pass through the beamsplitter with lowattenuation while being highly reective to the infraredwavelength (1310 nm) As a result the wavelength-selective beamsplitter creates two simultaneous scans avisible one into the eye and an IR one into the environ-ment

314 Robot-Arm Target Mover A ve-DOFrobot arm (Eshed Robotec Scorbot-ER 4PC) is used totest the tracking performance of the system Because therst conguration of this concept is installed on an opti-cal bench it is difcult for us to move the display (aswould be the case with the eventual head-coupled ver-sion of the system) relative to xed detectors in the en-vironment Instead we decided to move the detectorsusing the robot arm In case of the target-tracking con-guration (see subsection 32) we use the robot arm tomove a detection circuit at the predetermined trajec-tory equivalent to moving the head in the opposite tra-jectory The robot arm provides a repeatable movementand reports the ldquotruthrdquo position The target mover sys-tem is independently operated from the tracking systemby a separate computer Robot arm positions in veDOF (no roll axis) are recorded at 8 Hz using customsoftware and an interface The reported positions canthen be used for evaluating tracking performances

Figure 6 Optical relay components

10 PRESENCE VOLUME 12 NUMBER 1

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 6: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

side of the rst detector along the horizontal scan axisas illustrated in gure 2 creating multiple IR pulsesfrom each detector as shown in gure 3 Because thescanned beam strikes the two detectors sequentially thescan direction can be determined by the sequence ofdetection events

To obtain a precise horizontal location within thescan line one of the two detectors is chosen as the ref-erence As shown in gure 3 time detector A is the ref-erence detector and the timing regiment is when the IRbeam strikes detector A to when the returning beamalso strikes detector A We assume that this time dura-tion or measured pulse width is twice as long as the timefrom detector A to the scan edge or the actual begin-ning of the scan line The corresponding horizontal lo-cation is then calculated by halving this timing durationIn this way the beginning of the scan line is inferredand the location of the detector along the horizontalline relative to that beginning is determined Becauseseveral scan lines are likely to pass by the detector wecollect all of the events and pick the one in the middle(that is the middle of an odd set of scan lines or inter-polate if there are an even number) to determine thehorizontal location of the raster relative to the detector

To reiterate the timing from the beginning of theframe to the time when a beam hits detector A is usedto calculate the corresponding vertical location of theraster relative to the detector Timing from the intervalof successive passes of scan lines across the detectors isused to determine the horizontal location of the rasterrelative to the detector By knowing the location of thedetectors in the cockpit the relative orientation of thedisplay raster in azimuth and elevation angles can becalculated By locating three or more detectors in the

real-world environment that are within the instanta-neous external scan eld the position and orientation ofthe display raster in six degrees of freedom can be deter-mined This aspect is discussed in subsection 23

23 Tracking Principle

As stated a detectorrsquos 2D horizontal and verticallocation within the raster scan is the basis upon whichwe make all of our calculations of position and orienta-tion of the display raster in space We start by trackingin two dimensions in the time domain and then by cal-culation from multiple detectors relate that to trackingin the 3D spatial domain

231 Tracking in the Time Domain Track-ing in the time domain is dened in two dimensions as(pixel line) coordinates for horizontal and vertical loca-tions in the tracking raster respectively We use astraightforward way to calculate the corresponding pixeland line positions from the timing measurement ob-tained in the detection system For the pixel calculationthe horizontal timing is divided by a pixel clock (40nspixel) to obtain the corresponding pixel locationFor the line calculation the vertical timing is used todirectly calculate the corresponding line location by di-viding by the scan line time duration It is worth notingthat the error from not having an absolute scan linetime duration of the MRS is unlikely to have much ef-fect on the line accuracy This is because the truncatedresult does not likely reect the error Consequentiallywe can obtain a (pixel line) coordinate of the target(detector) in two dimensions

Even though 2D tracking in the time domain can beobtained via a single detector we are only capable ofsuperimposing an image or overlay in the direction ofthat single detector For example if the VRD is to su-perimpose an image slightly to the side of the detectorthe registration of the augmented image is changed inlocation slightly as the detector moves It is due to non-uniformity in pixel location and spacing within the dis-play FOV Consequently target-tracking applicationsare limited To fully use the SARSDT as an AR systemit is necessary to know where each pixel in the display

Figure 2 The scanned IR eld

6 PRESENCE VOLUME 12 NUMBER 1

actually overlays a location in the real worldmdashthe spatialinformation

232 Tracking in the Spatial Domain Tofully determine the intercept of each raster pixel (be-yond the single pixel detected previously) it is necessaryto determine the position of that raster pixel relative tothe real world By locating several detectors whose loca-tions in the real-world environment are known the po-sition and orientation of the display raster in six degreesof freedom can be determined

A fundamental problem of converting timing infor-mation into spatial information in the world coordinateis an orientation offset in three-dimensional space be-tween the IR scanned eld axes and the world axes(Chinthammit Seibel amp Furness 2002) The solutionto this problem is to determine a six-DOF transforma-tion between the world coordinates and the VRD coor-dinates

Generally this problem can be solved by triangulationtechniques in which the detectorsrsquo positions are known

relative to the real world Typically these techniquesrequire that distances from emittersource to each de-tector are known For example in the event of an earth-quake a seismograph can detect an earthquake magni-tude but not a precise direction and depth where theepicenter is located At least three seismographs areneeded to triangulate a precise epicenter see an exampleby Azuma and Ward (1991) Because our detection sys-tem is based on the timing incident in 2D raster scansingle-pixel detection cannot adequately determine thedistance Theoretically an infrared sensor unit can pos-sibly determine the distance from the signal strength ofthe received signal However the nature of our scan-ning eld causes an infrared beam to travel quickly overa sensing area of an infrared detector This makes it im-practical to accurately determine the distance from asignal strength level

This single-pixel detection problem in 2D coordinatescan commonly be addressed as a computer vision prob-lem such as a 3D reconstruction Therefore this trian-gulation problem can also be solved by computer vision

Figure 3 The pulse train from the dual-detector system

Chinthammit et al 7

techniques as well In our application a distance from aldquocamerardquo to each ldquomarkerrdquo is unknown whereas dis-tances of each marker (detector) relative to the referenceare known Our known markers are located in a 2D co-ordinate (pixel and line) through a captured videoframe The reconstruction performs on a principle that amarkerrsquos 3D position in the real world and the corre-sponding 2D coordinate on the camera screen must liealong the same ray formed by the camera projection Ifthe accurate 3D position estimations were found thedistance between markers would be equal to the physi-cal distances Generally the 3D position estimation issolved by an optimization process which can be timeconsuming to reach the optimal solution (Janin ZikanMizell Banner amp Sowizral 1994)

Another computer vision technique that requires lessoptimization process is to congure four markers intoeach vertex of the square Using the fact that there aretwo sets of parallel sides and their direction vectors per-pendicular to each other an image analysis algorithm isused to estimate the position and orientation of thecamera with respect to the square plane The optimiza-tion process is applied only to compensate for the errorin detected vertices coordinates that cause the directionvectors of the two parallel sides of the square not per-pendicular to each other Details on this developmentand algorithm are explained by Kato and Billinghurst(1999) Due to its simplicity we implement this imageanalysis technique on our hardware system to determinea six-DOF of detectorsrsquo plane with respect to the VRD

In gure 4 looking out from the VRD we dene areference 2D plane at a xed distance Zref which is animaginary distance away from the origin of the VRDcoordinate (The choice of Zref does not affect the over-all tracking accuracy) A projection of four vertices onour reference 2D plane represents vertices on the cam-era screen as implemented in the original algorithm(Kato amp Billinghurst 1999) Because a 2D coordinateis dened in the VRD coordinate converting timinginformation into spatial information becomes a muchsimpler issue The horizontal and vertical coordinate isdirectly obtained from (pixel line) coordinates in thetime domain Our 2D coordinates (or displacements) of

four vertices on that plane can be calculated by the fol-lowing equations

Displacementhorizontal ~t 5 2Kx

2 cos~2p freqx tx

(1)

Displacementvertical ~t 5 Ky z ty z freqY 2Ky

2

We dene subscripts (xy) as the directions along theMRSrsquos moving axis and the galvanometerrsquos moving axisrespectively Therefore (xy) can also be referred to asthe directions in the horizontal and vertical axes of theVRD scan The t is the timing measurement in bothaxes or (pixel line) coordinates from detections in timedomain Displacement is a calculated position in respec-tive axes although their calculations are derived differ-ently for horizontal and vertical axes The calculation ofthe horizontal displacement is modeled to correct forthe distortion of the sinusoidal nature of the MRSwhereas a linear model is used in the calculation of thevertical displacement due to a linearity of the galvanom-eter motion The displacement at the center of the scanis set to (00) for convenience purposes The K is thedisplacement over the entire FOV of the scanner in re-spective axes at the distance Zref The freq is the scan-

Figure 4 The illustration of how we dene a 2D plane in the VRD

coordinate

8 PRESENCE VOLUME 12 NUMBER 1

ning frequency of both scanners In conclusion equa-tion (1) transforms our 2D coordinates in time domainto 2D positions in the spatial domain

It is worth noting that this 2D position lies along thesame ray (formed by the scanner) with the actual 3Dposition in the real world Once the 2D position of eachvertex has been determined the image analysis algo-rithm can then determine the translation and orienta-tion of the square plane with respect to the scanner

3 Experimental Setting

This section rst describes all of the main appara-tuses in this experiment Then we describe our currentsystem conguration It is slightly different from whathave been described in subsection 21 in terms of amoving element Lastly performance metrics are de-ned as a way to characterize our proposed system

31 Experimental Apparatus

To characterize the utility and performance of theSARSDT concept we built a breadboard optical benchas shown in gure 7 Four subsystems compose the ex-perimental SARSDT a modied virtual retinal display arobot arm target mover a detection and data acquisi-tion subsystem and a reticle generator These sub-systems are described in more detail in the followingsubsections

311 The Virtual Retinal Display As stated inthe introduction the VRD scans laser light directly tothe retina of the eye to create augmented images It usestwo single-axis scanners to deect a beam of modulatedlaser light Currently the VRD scans at VGA format or640 3 480 pixel resolution The vertical scan rate isequivalent to the progressive frame rate Because the 60Hz vertical scan rate is relatively low an existing galva-nometer scanner is used as the vertical scanner (Cam-bridge Instruments) Galvanometers are capable of scan-ning through wide angles but at frequencies that aremuch lower than the required horizontal frequenciesEarly in our development of the VRD we built a me-

chanical resonant scanner (MRS) (Johnston amp Willey1995) to meet the horizontal scanner requirementsThe only moving part is a torsional springmirror com-bination used in a ux circuit Eliminating moving coilsor magnets (contained in existing mechanical resonantscanners) greatly lowers the rotational inertia of the de-vice thus a high operational frequency can be achievedAnother unique feature of the MRS is its size (the mir-ror is 2 3 6 mm) The entire scanner can be very smalland is being made smaller as a micro-electromechanicalsystem (MEMS) (Wine Helsel Jenkins Urey amp Os-born 2000)

To extend the eld of view of the horizontal scannerwe arrange the scanning mirrors to cause the beam toreect twice between the MRS and galvanometer mir-rors as shown in gure 5 This approach effectively mul-tiplies the beam deection by 4 relative to the actualrotational angle of the horizontal scanner

As shown in gure 6 the scanned beams passthrough a beamsplitter (extended hot mirror EdmundScientic) and spherical optical mirror to converge andcollimate the visible light at the exit pupil When theviewer aligns her eye entrance pupil with the exit pupilof the VRD she perceives a high-luminance display atoptical innity that is overlaid onto the natural view ofthe surround

312 Light Sources For convenience in theSARSDT breadboard system we use only one wave-length of visible light (red) and one IR wavelength Thevisible light source is a directly modulated red laser di-

Figure 5 Two scanning mirrors

Chinthammit et al 9

ode (635 nm Melles Griot Inc) that is ber opticallycoupled The IR laser diode (1310 nm Newport Corp)is not modulated Both visible and IR light are com-bined using an IR beroptic combiner (Newport Corp)into a single multispectral beam The result is a multiplespectra beam The collimation of the mixed wavelengthbeam is individually optimized for the IR and visible wave-length before being scanned by the two-axis scanner

313 Wavelength-Selective BeamsplitterFor the SARSDT the normal beamsplitter in the VRDconguration is replaced by a wavelength-selectivebeamsplitter or hot mirror This allows the red visiblewavelength to pass through the beamsplitter with lowattenuation while being highly reective to the infraredwavelength (1310 nm) As a result the wavelength-selective beamsplitter creates two simultaneous scans avisible one into the eye and an IR one into the environ-ment

314 Robot-Arm Target Mover A ve-DOFrobot arm (Eshed Robotec Scorbot-ER 4PC) is used totest the tracking performance of the system Because therst conguration of this concept is installed on an opti-cal bench it is difcult for us to move the display (aswould be the case with the eventual head-coupled ver-sion of the system) relative to xed detectors in the en-vironment Instead we decided to move the detectorsusing the robot arm In case of the target-tracking con-guration (see subsection 32) we use the robot arm tomove a detection circuit at the predetermined trajec-tory equivalent to moving the head in the opposite tra-jectory The robot arm provides a repeatable movementand reports the ldquotruthrdquo position The target mover sys-tem is independently operated from the tracking systemby a separate computer Robot arm positions in veDOF (no roll axis) are recorded at 8 Hz using customsoftware and an interface The reported positions canthen be used for evaluating tracking performances

Figure 6 Optical relay components

10 PRESENCE VOLUME 12 NUMBER 1

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 7: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

actually overlays a location in the real worldmdashthe spatialinformation

232 Tracking in the Spatial Domain Tofully determine the intercept of each raster pixel (be-yond the single pixel detected previously) it is necessaryto determine the position of that raster pixel relative tothe real world By locating several detectors whose loca-tions in the real-world environment are known the po-sition and orientation of the display raster in six degreesof freedom can be determined

A fundamental problem of converting timing infor-mation into spatial information in the world coordinateis an orientation offset in three-dimensional space be-tween the IR scanned eld axes and the world axes(Chinthammit Seibel amp Furness 2002) The solutionto this problem is to determine a six-DOF transforma-tion between the world coordinates and the VRD coor-dinates

Generally this problem can be solved by triangulationtechniques in which the detectorsrsquo positions are known

relative to the real world Typically these techniquesrequire that distances from emittersource to each de-tector are known For example in the event of an earth-quake a seismograph can detect an earthquake magni-tude but not a precise direction and depth where theepicenter is located At least three seismographs areneeded to triangulate a precise epicenter see an exampleby Azuma and Ward (1991) Because our detection sys-tem is based on the timing incident in 2D raster scansingle-pixel detection cannot adequately determine thedistance Theoretically an infrared sensor unit can pos-sibly determine the distance from the signal strength ofthe received signal However the nature of our scan-ning eld causes an infrared beam to travel quickly overa sensing area of an infrared detector This makes it im-practical to accurately determine the distance from asignal strength level

This single-pixel detection problem in 2D coordinatescan commonly be addressed as a computer vision prob-lem such as a 3D reconstruction Therefore this trian-gulation problem can also be solved by computer vision

Figure 3 The pulse train from the dual-detector system

Chinthammit et al 7

techniques as well In our application a distance from aldquocamerardquo to each ldquomarkerrdquo is unknown whereas dis-tances of each marker (detector) relative to the referenceare known Our known markers are located in a 2D co-ordinate (pixel and line) through a captured videoframe The reconstruction performs on a principle that amarkerrsquos 3D position in the real world and the corre-sponding 2D coordinate on the camera screen must liealong the same ray formed by the camera projection Ifthe accurate 3D position estimations were found thedistance between markers would be equal to the physi-cal distances Generally the 3D position estimation issolved by an optimization process which can be timeconsuming to reach the optimal solution (Janin ZikanMizell Banner amp Sowizral 1994)

Another computer vision technique that requires lessoptimization process is to congure four markers intoeach vertex of the square Using the fact that there aretwo sets of parallel sides and their direction vectors per-pendicular to each other an image analysis algorithm isused to estimate the position and orientation of thecamera with respect to the square plane The optimiza-tion process is applied only to compensate for the errorin detected vertices coordinates that cause the directionvectors of the two parallel sides of the square not per-pendicular to each other Details on this developmentand algorithm are explained by Kato and Billinghurst(1999) Due to its simplicity we implement this imageanalysis technique on our hardware system to determinea six-DOF of detectorsrsquo plane with respect to the VRD

In gure 4 looking out from the VRD we dene areference 2D plane at a xed distance Zref which is animaginary distance away from the origin of the VRDcoordinate (The choice of Zref does not affect the over-all tracking accuracy) A projection of four vertices onour reference 2D plane represents vertices on the cam-era screen as implemented in the original algorithm(Kato amp Billinghurst 1999) Because a 2D coordinateis dened in the VRD coordinate converting timinginformation into spatial information becomes a muchsimpler issue The horizontal and vertical coordinate isdirectly obtained from (pixel line) coordinates in thetime domain Our 2D coordinates (or displacements) of

four vertices on that plane can be calculated by the fol-lowing equations

Displacementhorizontal ~t 5 2Kx

2 cos~2p freqx tx

(1)

Displacementvertical ~t 5 Ky z ty z freqY 2Ky

2

We dene subscripts (xy) as the directions along theMRSrsquos moving axis and the galvanometerrsquos moving axisrespectively Therefore (xy) can also be referred to asthe directions in the horizontal and vertical axes of theVRD scan The t is the timing measurement in bothaxes or (pixel line) coordinates from detections in timedomain Displacement is a calculated position in respec-tive axes although their calculations are derived differ-ently for horizontal and vertical axes The calculation ofthe horizontal displacement is modeled to correct forthe distortion of the sinusoidal nature of the MRSwhereas a linear model is used in the calculation of thevertical displacement due to a linearity of the galvanom-eter motion The displacement at the center of the scanis set to (00) for convenience purposes The K is thedisplacement over the entire FOV of the scanner in re-spective axes at the distance Zref The freq is the scan-

Figure 4 The illustration of how we dene a 2D plane in the VRD

coordinate

8 PRESENCE VOLUME 12 NUMBER 1

ning frequency of both scanners In conclusion equa-tion (1) transforms our 2D coordinates in time domainto 2D positions in the spatial domain

It is worth noting that this 2D position lies along thesame ray (formed by the scanner) with the actual 3Dposition in the real world Once the 2D position of eachvertex has been determined the image analysis algo-rithm can then determine the translation and orienta-tion of the square plane with respect to the scanner

3 Experimental Setting

This section rst describes all of the main appara-tuses in this experiment Then we describe our currentsystem conguration It is slightly different from whathave been described in subsection 21 in terms of amoving element Lastly performance metrics are de-ned as a way to characterize our proposed system

31 Experimental Apparatus

To characterize the utility and performance of theSARSDT concept we built a breadboard optical benchas shown in gure 7 Four subsystems compose the ex-perimental SARSDT a modied virtual retinal display arobot arm target mover a detection and data acquisi-tion subsystem and a reticle generator These sub-systems are described in more detail in the followingsubsections

311 The Virtual Retinal Display As stated inthe introduction the VRD scans laser light directly tothe retina of the eye to create augmented images It usestwo single-axis scanners to deect a beam of modulatedlaser light Currently the VRD scans at VGA format or640 3 480 pixel resolution The vertical scan rate isequivalent to the progressive frame rate Because the 60Hz vertical scan rate is relatively low an existing galva-nometer scanner is used as the vertical scanner (Cam-bridge Instruments) Galvanometers are capable of scan-ning through wide angles but at frequencies that aremuch lower than the required horizontal frequenciesEarly in our development of the VRD we built a me-

chanical resonant scanner (MRS) (Johnston amp Willey1995) to meet the horizontal scanner requirementsThe only moving part is a torsional springmirror com-bination used in a ux circuit Eliminating moving coilsor magnets (contained in existing mechanical resonantscanners) greatly lowers the rotational inertia of the de-vice thus a high operational frequency can be achievedAnother unique feature of the MRS is its size (the mir-ror is 2 3 6 mm) The entire scanner can be very smalland is being made smaller as a micro-electromechanicalsystem (MEMS) (Wine Helsel Jenkins Urey amp Os-born 2000)

To extend the eld of view of the horizontal scannerwe arrange the scanning mirrors to cause the beam toreect twice between the MRS and galvanometer mir-rors as shown in gure 5 This approach effectively mul-tiplies the beam deection by 4 relative to the actualrotational angle of the horizontal scanner

As shown in gure 6 the scanned beams passthrough a beamsplitter (extended hot mirror EdmundScientic) and spherical optical mirror to converge andcollimate the visible light at the exit pupil When theviewer aligns her eye entrance pupil with the exit pupilof the VRD she perceives a high-luminance display atoptical innity that is overlaid onto the natural view ofthe surround

312 Light Sources For convenience in theSARSDT breadboard system we use only one wave-length of visible light (red) and one IR wavelength Thevisible light source is a directly modulated red laser di-

Figure 5 Two scanning mirrors

Chinthammit et al 9

ode (635 nm Melles Griot Inc) that is ber opticallycoupled The IR laser diode (1310 nm Newport Corp)is not modulated Both visible and IR light are com-bined using an IR beroptic combiner (Newport Corp)into a single multispectral beam The result is a multiplespectra beam The collimation of the mixed wavelengthbeam is individually optimized for the IR and visible wave-length before being scanned by the two-axis scanner

313 Wavelength-Selective BeamsplitterFor the SARSDT the normal beamsplitter in the VRDconguration is replaced by a wavelength-selectivebeamsplitter or hot mirror This allows the red visiblewavelength to pass through the beamsplitter with lowattenuation while being highly reective to the infraredwavelength (1310 nm) As a result the wavelength-selective beamsplitter creates two simultaneous scans avisible one into the eye and an IR one into the environ-ment

314 Robot-Arm Target Mover A ve-DOFrobot arm (Eshed Robotec Scorbot-ER 4PC) is used totest the tracking performance of the system Because therst conguration of this concept is installed on an opti-cal bench it is difcult for us to move the display (aswould be the case with the eventual head-coupled ver-sion of the system) relative to xed detectors in the en-vironment Instead we decided to move the detectorsusing the robot arm In case of the target-tracking con-guration (see subsection 32) we use the robot arm tomove a detection circuit at the predetermined trajec-tory equivalent to moving the head in the opposite tra-jectory The robot arm provides a repeatable movementand reports the ldquotruthrdquo position The target mover sys-tem is independently operated from the tracking systemby a separate computer Robot arm positions in veDOF (no roll axis) are recorded at 8 Hz using customsoftware and an interface The reported positions canthen be used for evaluating tracking performances

Figure 6 Optical relay components

10 PRESENCE VOLUME 12 NUMBER 1

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 8: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

techniques as well In our application a distance from aldquocamerardquo to each ldquomarkerrdquo is unknown whereas dis-tances of each marker (detector) relative to the referenceare known Our known markers are located in a 2D co-ordinate (pixel and line) through a captured videoframe The reconstruction performs on a principle that amarkerrsquos 3D position in the real world and the corre-sponding 2D coordinate on the camera screen must liealong the same ray formed by the camera projection Ifthe accurate 3D position estimations were found thedistance between markers would be equal to the physi-cal distances Generally the 3D position estimation issolved by an optimization process which can be timeconsuming to reach the optimal solution (Janin ZikanMizell Banner amp Sowizral 1994)

Another computer vision technique that requires lessoptimization process is to congure four markers intoeach vertex of the square Using the fact that there aretwo sets of parallel sides and their direction vectors per-pendicular to each other an image analysis algorithm isused to estimate the position and orientation of thecamera with respect to the square plane The optimiza-tion process is applied only to compensate for the errorin detected vertices coordinates that cause the directionvectors of the two parallel sides of the square not per-pendicular to each other Details on this developmentand algorithm are explained by Kato and Billinghurst(1999) Due to its simplicity we implement this imageanalysis technique on our hardware system to determinea six-DOF of detectorsrsquo plane with respect to the VRD

In gure 4 looking out from the VRD we dene areference 2D plane at a xed distance Zref which is animaginary distance away from the origin of the VRDcoordinate (The choice of Zref does not affect the over-all tracking accuracy) A projection of four vertices onour reference 2D plane represents vertices on the cam-era screen as implemented in the original algorithm(Kato amp Billinghurst 1999) Because a 2D coordinateis dened in the VRD coordinate converting timinginformation into spatial information becomes a muchsimpler issue The horizontal and vertical coordinate isdirectly obtained from (pixel line) coordinates in thetime domain Our 2D coordinates (or displacements) of

four vertices on that plane can be calculated by the fol-lowing equations

Displacementhorizontal ~t 5 2Kx

2 cos~2p freqx tx

(1)

Displacementvertical ~t 5 Ky z ty z freqY 2Ky

2

We dene subscripts (xy) as the directions along theMRSrsquos moving axis and the galvanometerrsquos moving axisrespectively Therefore (xy) can also be referred to asthe directions in the horizontal and vertical axes of theVRD scan The t is the timing measurement in bothaxes or (pixel line) coordinates from detections in timedomain Displacement is a calculated position in respec-tive axes although their calculations are derived differ-ently for horizontal and vertical axes The calculation ofthe horizontal displacement is modeled to correct forthe distortion of the sinusoidal nature of the MRSwhereas a linear model is used in the calculation of thevertical displacement due to a linearity of the galvanom-eter motion The displacement at the center of the scanis set to (00) for convenience purposes The K is thedisplacement over the entire FOV of the scanner in re-spective axes at the distance Zref The freq is the scan-

Figure 4 The illustration of how we dene a 2D plane in the VRD

coordinate

8 PRESENCE VOLUME 12 NUMBER 1

ning frequency of both scanners In conclusion equa-tion (1) transforms our 2D coordinates in time domainto 2D positions in the spatial domain

It is worth noting that this 2D position lies along thesame ray (formed by the scanner) with the actual 3Dposition in the real world Once the 2D position of eachvertex has been determined the image analysis algo-rithm can then determine the translation and orienta-tion of the square plane with respect to the scanner

3 Experimental Setting

This section rst describes all of the main appara-tuses in this experiment Then we describe our currentsystem conguration It is slightly different from whathave been described in subsection 21 in terms of amoving element Lastly performance metrics are de-ned as a way to characterize our proposed system

31 Experimental Apparatus

To characterize the utility and performance of theSARSDT concept we built a breadboard optical benchas shown in gure 7 Four subsystems compose the ex-perimental SARSDT a modied virtual retinal display arobot arm target mover a detection and data acquisi-tion subsystem and a reticle generator These sub-systems are described in more detail in the followingsubsections

311 The Virtual Retinal Display As stated inthe introduction the VRD scans laser light directly tothe retina of the eye to create augmented images It usestwo single-axis scanners to deect a beam of modulatedlaser light Currently the VRD scans at VGA format or640 3 480 pixel resolution The vertical scan rate isequivalent to the progressive frame rate Because the 60Hz vertical scan rate is relatively low an existing galva-nometer scanner is used as the vertical scanner (Cam-bridge Instruments) Galvanometers are capable of scan-ning through wide angles but at frequencies that aremuch lower than the required horizontal frequenciesEarly in our development of the VRD we built a me-

chanical resonant scanner (MRS) (Johnston amp Willey1995) to meet the horizontal scanner requirementsThe only moving part is a torsional springmirror com-bination used in a ux circuit Eliminating moving coilsor magnets (contained in existing mechanical resonantscanners) greatly lowers the rotational inertia of the de-vice thus a high operational frequency can be achievedAnother unique feature of the MRS is its size (the mir-ror is 2 3 6 mm) The entire scanner can be very smalland is being made smaller as a micro-electromechanicalsystem (MEMS) (Wine Helsel Jenkins Urey amp Os-born 2000)

To extend the eld of view of the horizontal scannerwe arrange the scanning mirrors to cause the beam toreect twice between the MRS and galvanometer mir-rors as shown in gure 5 This approach effectively mul-tiplies the beam deection by 4 relative to the actualrotational angle of the horizontal scanner

As shown in gure 6 the scanned beams passthrough a beamsplitter (extended hot mirror EdmundScientic) and spherical optical mirror to converge andcollimate the visible light at the exit pupil When theviewer aligns her eye entrance pupil with the exit pupilof the VRD she perceives a high-luminance display atoptical innity that is overlaid onto the natural view ofthe surround

312 Light Sources For convenience in theSARSDT breadboard system we use only one wave-length of visible light (red) and one IR wavelength Thevisible light source is a directly modulated red laser di-

Figure 5 Two scanning mirrors

Chinthammit et al 9

ode (635 nm Melles Griot Inc) that is ber opticallycoupled The IR laser diode (1310 nm Newport Corp)is not modulated Both visible and IR light are com-bined using an IR beroptic combiner (Newport Corp)into a single multispectral beam The result is a multiplespectra beam The collimation of the mixed wavelengthbeam is individually optimized for the IR and visible wave-length before being scanned by the two-axis scanner

313 Wavelength-Selective BeamsplitterFor the SARSDT the normal beamsplitter in the VRDconguration is replaced by a wavelength-selectivebeamsplitter or hot mirror This allows the red visiblewavelength to pass through the beamsplitter with lowattenuation while being highly reective to the infraredwavelength (1310 nm) As a result the wavelength-selective beamsplitter creates two simultaneous scans avisible one into the eye and an IR one into the environ-ment

314 Robot-Arm Target Mover A ve-DOFrobot arm (Eshed Robotec Scorbot-ER 4PC) is used totest the tracking performance of the system Because therst conguration of this concept is installed on an opti-cal bench it is difcult for us to move the display (aswould be the case with the eventual head-coupled ver-sion of the system) relative to xed detectors in the en-vironment Instead we decided to move the detectorsusing the robot arm In case of the target-tracking con-guration (see subsection 32) we use the robot arm tomove a detection circuit at the predetermined trajec-tory equivalent to moving the head in the opposite tra-jectory The robot arm provides a repeatable movementand reports the ldquotruthrdquo position The target mover sys-tem is independently operated from the tracking systemby a separate computer Robot arm positions in veDOF (no roll axis) are recorded at 8 Hz using customsoftware and an interface The reported positions canthen be used for evaluating tracking performances

Figure 6 Optical relay components

10 PRESENCE VOLUME 12 NUMBER 1

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 9: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

ning frequency of both scanners In conclusion equa-tion (1) transforms our 2D coordinates in time domainto 2D positions in the spatial domain

It is worth noting that this 2D position lies along thesame ray (formed by the scanner) with the actual 3Dposition in the real world Once the 2D position of eachvertex has been determined the image analysis algo-rithm can then determine the translation and orienta-tion of the square plane with respect to the scanner

3 Experimental Setting

This section rst describes all of the main appara-tuses in this experiment Then we describe our currentsystem conguration It is slightly different from whathave been described in subsection 21 in terms of amoving element Lastly performance metrics are de-ned as a way to characterize our proposed system

31 Experimental Apparatus

To characterize the utility and performance of theSARSDT concept we built a breadboard optical benchas shown in gure 7 Four subsystems compose the ex-perimental SARSDT a modied virtual retinal display arobot arm target mover a detection and data acquisi-tion subsystem and a reticle generator These sub-systems are described in more detail in the followingsubsections

311 The Virtual Retinal Display As stated inthe introduction the VRD scans laser light directly tothe retina of the eye to create augmented images It usestwo single-axis scanners to deect a beam of modulatedlaser light Currently the VRD scans at VGA format or640 3 480 pixel resolution The vertical scan rate isequivalent to the progressive frame rate Because the 60Hz vertical scan rate is relatively low an existing galva-nometer scanner is used as the vertical scanner (Cam-bridge Instruments) Galvanometers are capable of scan-ning through wide angles but at frequencies that aremuch lower than the required horizontal frequenciesEarly in our development of the VRD we built a me-

chanical resonant scanner (MRS) (Johnston amp Willey1995) to meet the horizontal scanner requirementsThe only moving part is a torsional springmirror com-bination used in a ux circuit Eliminating moving coilsor magnets (contained in existing mechanical resonantscanners) greatly lowers the rotational inertia of the de-vice thus a high operational frequency can be achievedAnother unique feature of the MRS is its size (the mir-ror is 2 3 6 mm) The entire scanner can be very smalland is being made smaller as a micro-electromechanicalsystem (MEMS) (Wine Helsel Jenkins Urey amp Os-born 2000)

To extend the eld of view of the horizontal scannerwe arrange the scanning mirrors to cause the beam toreect twice between the MRS and galvanometer mir-rors as shown in gure 5 This approach effectively mul-tiplies the beam deection by 4 relative to the actualrotational angle of the horizontal scanner

As shown in gure 6 the scanned beams passthrough a beamsplitter (extended hot mirror EdmundScientic) and spherical optical mirror to converge andcollimate the visible light at the exit pupil When theviewer aligns her eye entrance pupil with the exit pupilof the VRD she perceives a high-luminance display atoptical innity that is overlaid onto the natural view ofthe surround

312 Light Sources For convenience in theSARSDT breadboard system we use only one wave-length of visible light (red) and one IR wavelength Thevisible light source is a directly modulated red laser di-

Figure 5 Two scanning mirrors

Chinthammit et al 9

ode (635 nm Melles Griot Inc) that is ber opticallycoupled The IR laser diode (1310 nm Newport Corp)is not modulated Both visible and IR light are com-bined using an IR beroptic combiner (Newport Corp)into a single multispectral beam The result is a multiplespectra beam The collimation of the mixed wavelengthbeam is individually optimized for the IR and visible wave-length before being scanned by the two-axis scanner

313 Wavelength-Selective BeamsplitterFor the SARSDT the normal beamsplitter in the VRDconguration is replaced by a wavelength-selectivebeamsplitter or hot mirror This allows the red visiblewavelength to pass through the beamsplitter with lowattenuation while being highly reective to the infraredwavelength (1310 nm) As a result the wavelength-selective beamsplitter creates two simultaneous scans avisible one into the eye and an IR one into the environ-ment

314 Robot-Arm Target Mover A ve-DOFrobot arm (Eshed Robotec Scorbot-ER 4PC) is used totest the tracking performance of the system Because therst conguration of this concept is installed on an opti-cal bench it is difcult for us to move the display (aswould be the case with the eventual head-coupled ver-sion of the system) relative to xed detectors in the en-vironment Instead we decided to move the detectorsusing the robot arm In case of the target-tracking con-guration (see subsection 32) we use the robot arm tomove a detection circuit at the predetermined trajec-tory equivalent to moving the head in the opposite tra-jectory The robot arm provides a repeatable movementand reports the ldquotruthrdquo position The target mover sys-tem is independently operated from the tracking systemby a separate computer Robot arm positions in veDOF (no roll axis) are recorded at 8 Hz using customsoftware and an interface The reported positions canthen be used for evaluating tracking performances

Figure 6 Optical relay components

10 PRESENCE VOLUME 12 NUMBER 1

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 10: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

ode (635 nm Melles Griot Inc) that is ber opticallycoupled The IR laser diode (1310 nm Newport Corp)is not modulated Both visible and IR light are com-bined using an IR beroptic combiner (Newport Corp)into a single multispectral beam The result is a multiplespectra beam The collimation of the mixed wavelengthbeam is individually optimized for the IR and visible wave-length before being scanned by the two-axis scanner

313 Wavelength-Selective BeamsplitterFor the SARSDT the normal beamsplitter in the VRDconguration is replaced by a wavelength-selectivebeamsplitter or hot mirror This allows the red visiblewavelength to pass through the beamsplitter with lowattenuation while being highly reective to the infraredwavelength (1310 nm) As a result the wavelength-selective beamsplitter creates two simultaneous scans avisible one into the eye and an IR one into the environ-ment

314 Robot-Arm Target Mover A ve-DOFrobot arm (Eshed Robotec Scorbot-ER 4PC) is used totest the tracking performance of the system Because therst conguration of this concept is installed on an opti-cal bench it is difcult for us to move the display (aswould be the case with the eventual head-coupled ver-sion of the system) relative to xed detectors in the en-vironment Instead we decided to move the detectorsusing the robot arm In case of the target-tracking con-guration (see subsection 32) we use the robot arm tomove a detection circuit at the predetermined trajec-tory equivalent to moving the head in the opposite tra-jectory The robot arm provides a repeatable movementand reports the ldquotruthrdquo position The target mover sys-tem is independently operated from the tracking systemby a separate computer Robot arm positions in veDOF (no roll axis) are recorded at 8 Hz using customsoftware and an interface The reported positions canthen be used for evaluating tracking performances

Figure 6 Optical relay components

10 PRESENCE VOLUME 12 NUMBER 1

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 11: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

315 Detection and Data Acquisition Sys-tem The detection system consists of infrared detec-tors (GPD GAP 300) a low-noise amplier and a pro-cessing unit Two IR detectors and a low-noise amplierare built into one circuit contained in an aluminum boxattached to the robot arm in gure 7 The signal gener-ated from the low-noise amplier is then sent to theprocessing unit It consists of a set of ip-ops thatmeasure specic time durations between detected sig-nals and reference signals The circuit architecture de-pends on the detection technique that is selected

The output of the detection system is sampled by anacquisition system It is a timercounter board (Na-tional Instruments) with an 80 MHz clock speed Agraphical programming language Labview is pro-grammed to interface with the acquisition board TheLabview program is also written to determine the detec-torsrsquo locations within a display scan A pair of line andpixel numbers dene the location in two dimensionsand these values are then digitally output to a displaysystem

316 Reticle Generator Subsystem To visu-ally indicate the output of the tracking part of theSARSDT we generate and project a reticle using thedisplay portion of the VRD system But instead of pro-jecting on the retina we turn the visible beam around

and project it on the target being moved by the robotarm The reticle generator draws a crosshair reticle witha center at the calculated pixel and line locations Thegenerator consists of two identical sets of counters onefor pixel count and another for line count Each has dif-ferent sets of reference signals from the VRD electron-ics the beginning of the frame signal (BOF) and thebeginning of the scan line signal (BOL) for line countand BOL and pixel clock for pixel count The linecounters are reset by the BOF and counted by the BOLSimilarly the pixel counters are reset by the BOL andcounted by the pixel clock Both continue to count un-til the reported pixel and line number are reachedTurning on the same pixel position in every scan linecreates the vertical portion of the crosshair whereasturning all pixels in the reported line generates the hori-zontal crosshair symbol

32 System Con guration

As stated earlier it is difcult for us to move theVRD (as would be the case with the eventual head-coupled version of the system) relative to xed detectorsin the environment Thus we decided to move the de-tectors using the robot arm as in the case of the target-tracking conguration The combination of the robotarm and the VRD projector as described above effec-tively simulates moving the VRD and the userrsquos head inthe environment relative to xed locations in space(That is the VRD is xed and the IR detectors are be-ing moved by the robot arm system as illustrated ingure 7)

33 Performance Metrics

To characterize the performance of this trackingsystem we consider the following performance metrics

331 Stability The trackerrsquos stability refers tothe repeatability of the reported positions when the de-tectors are stationary It also refers to the precision ofthe tracking system Ideally the data recorded over timeshould have no distribution or zero standard deviationUnfortunately the actual results are not ideal Inherent

Figure 7 SARSDT as targeting system

Chinthammit et al 11

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 12: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

in the SARSDT are two error sources measurementerror and scanner jitter Both are random errors thatlimit the tracking precision and therefore cannot becompensated by calibration procedures

332 Static Error Static error is dened as errorthat occurs when the user is stationary The sources ofstatic errors are calibration error and tracker error Cali-bration procedures are used to calibrate a tracking sys-tem with a display system so that augmented image isaligned correctly The calibration parameters associatedwith the tracking system are transformations among thescanner the sensor unit and the world coordinates Theparameters associated with the display system are theeld of view and the nonlinearity of the MRS Inaccu-rate parameters cause systematic errors which can becompensated by calibration procedures Furthermoretracker error also contributes to the static error

333 Dynamic Error Dynamic error is denedas error that occurs when the user is in motion Theprimary source of dynamic error is system latency whichis dened as the time duration from sensing a positionto rendering images It includes the computational timerequired in the tracking system as well as communica-tion time required between subsystems (such as displaysystem sensing unit and the computing unit) A signi-cant latency causes the virtual image relative to the realobject in the environment to appear to trail or lag be-hind during a head movement

4 Experiments Results and Discussions

This section describes different experiments thatare derived from the performance metrics (See subsec-tion 33) Results and discussions are discussed in eachexperiment Lastly future works are discussed

41 The Stability of the DetectionSystem

The stability of the detection circuit was obtainedby analyzing the recorded timing data when the target

detectors were stationary in the 8 3 65 deg FOV ofthe scan The standard deviations of the data were mea-sured and the ranges are reported as 12 3 standarddeviations The horizontal axis and the vertical axis wereanalyzed separately The vertical stabilityprecision iswithin 00126 deg or one scan line and the horizontalstability is 12 00516 deg or 55 pixels Unlike thevideo-based tracking system the precision of the proto-type SARSDT system is not limited by the pixelateddisplay resolution (640 3 480 within the eld of view)but rather by how repeatable and stable the timing mea-surement is

The pixel error along the horizontal axis originatesfrom two main sources the VRD scanner introduces anelectromechanical jitter and our current software opera-tional system Windows 98 is not capable of consis-tently acquiring data between frames at the exact displayrefresh rate Neglected are environmental noise sourcessuch as mirror surface degradation scattering from dustdetector shot noise vibration and so forth

Both sources of precision error contribute to the tim-ing uctuation in the recorded data between framesFirst multiple pixels are detected from different scanlines in one frame Each scan line generates slightly dif-ferent timing information due to the orientation of theIR sensing area with respect to the angle of the scannedbeam producing small timing uctuations among mul-tiple detected timing measurements or pulses in anysingle frame Because the software operational system isunable to control the precise acquisition time the orderof the pulses in the processing data buffer does not cor-respond exactly to its proper IR detectorrsquos physical sens-ing area For example the rst data in the buffer are notnecessarily related to the top area of the sensing areawhere a beam hits rst

One method that can improve the stability of the de-tection circuit is a detection technique that would aver-age multiple pulses across multiple scan lines Thusthere would be only one averaged pulse in each frameThis would reduce the error from the operational sys-tem by selecting one pulse with respect to several pulsesthat have variable timing measurements However addi-tional acquisition time is inevitable because the averag-ing techniques require the integration of detected pulses

12 PRESENCE VOLUME 12 NUMBER 1

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 13: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

over multiple scan lines Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg or 12 24 pixels if the partial elimination ofthe scanner jitter effect and ideal acquisition time aremanually and properly imposed

The source of line error along the vertical axis is likelyto be an electromechanical jitter from the VRD Be-cause the rst pulse of multiple pixels in a buffer deter-mines the line location the resulting jitter causes therst pulse to move slightly vertically Therefore we ex-pect a line error within one scan line

The stability of the detection system is a fundamentalerror of our tracking system It propagates through therest of the system error

42 Static Accuracy

We start by tracking in two dimensions in the timedomain and then by calculation from multiple detec-tors relate that to tracking in the 3D spatial domain

421 Tracking in the Time Domain Todemonstrate tracking in the time domain the dual-de-tector system was mounted on the robot arm systemand the VRD was xed in position as shown in gure 7Using the IR tracking system in time domain once thepixel location of the reference detector is calculated ared crosshair reticle was projected onto the reference IRdetector in the subsequent video frame The red cross-hair pattern from the VRD was superimposed onto thereference detector The crosshair reticle was one pixeland one scan line wide for the vertical and horizontallines respectively The robot arm was programmed tomove from one corner to the next within the current165 3 65 deg FOV of the scan The rectilinear mo-tion of the target was primarily planar with a z distanceof 30 in from the VRD scanner This tracking sequencecan be viewed on our Web site (Chinthammit BursteinSeibel amp Furness 2001)

A digital camera was set at the eye position to captureimages of the crosshair through a custom hot mirror asdescribed in the system conguration Figure 8 showsthe crosshair projected on the reference IR detector(left-side detector)

We then analyze the images to determine the staticerror The static error at the calibration point is consid-ered to be zero For each image the vertical and hori-zontal error was determined separately and in an angu-lar unit or degree Finally an average and a standarddeviation of both axes errors were calculated The re-sults were 008 12 004 and 003 12 002 deg forthe horizontal and vertical axes respectively

This static error is due to a calibration error in thedisplay system and the sinusoidal scan motion in thehorizontal axis The display calibration requires the de-termination of the phase shift between the electronicdrive frequency and the mechanical resonant frequencyThis phase shift is used to determine when the rst pixelof the scan line should be displayed to match the veryedge of the horizontal mechanical scan It can be doneonly visually and the calibration error is therefore inevi-table Due to the sinusoidal scan motion the calibrationerror magnitude is different along the horizontal scanBecause the calibration is taken at the middle of thescan the registration error increases as the detectormoves closer to the edge of the scan where the infraredbeam travels slower

Although the robot arm moved orthogonally fromthe z axis one advantage of the current 2D trackingsystem is that the technique can be operated in a 3Dspace without having to determine the absolute posi-tions which is a requirement for other tracking systemsThus the proposed technique has a high computational

Figure 8 Crosshair overlay on the detector

Chinthammit et al 13

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 14: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

efciency for tracking the designated object Addition-ally a more passive approach is to place retro-reectivetargets in the surroundings which concentrate all theactive system components at the display This makes thissystem applicable to nonmilitary and unconstrained ARapplications such as tracking the viewerrsquos hand wandor mouse For example Foxlin and Harrington (2000)have suggested wearable gaming applications and ldquoin-formation cockpitsrdquo that use a wireless hand trackerworn as a small ring The active acoustic emitter couldbe replaced with a passive retro-reective lm worn as asmall ring In virtual environments (VEs) an applicationfor tracking a single outstretched nger is to select ob-jects using line-of-sight 2D projections onto the 3Dscene A retro-reector at the tip of a nger or wand cancreate the same image plane selection manipulationand navigation in VEs similar to the ldquoSticky Fingerrdquoproposed by Pierce et al (1997) and Zeleznik LaViolaFeliz and Keefe (2002) Tracking two ngers addsmore degrees of freedom in the manipulation of virtualobjects in VE and AR such as the ldquoHead Crusherrdquotechnique that can select scale and rotate objects(Pierce et al 1997)

However one limitation of our dual-detector tech-nique is the fact that this is a line-of-sight tracking sys-tem the working range is limited by the acceptance an-gle of the VRD eld of view and the IR detectors Thedetection system has an acceptance angle of approxi-mately 12 60 deg limited by the detectorrsquos enclo-sure Furthermore the dual-detector scheme has a verylimited range of tracking target movement along theroll axis because the horizontal scan lines must strikeboth detectors (required by the dual-detector scheme)Currently the detectors are 55 mm apart and eachsensor diameter is approximately 300 microns Conse-quently the roll acceptance angle is less than 110deg

To increase the acceptance angle for target roll thedetector enclosure can be removed to bring the twodetectors in closer proximity If cells are placed side byside the roll acceptance is estimated to be 12 45 degTo further improve the range of roll motion a quad-photo detector can be used For extreme angles of roll

one pair of the dual detector system generates the pulsesbut not the other

422 Tracking in the Spatial Domain Asdescribed in the previous subsection the superimposedimage can only be accurately projected at the detectorand not anywhere else in the scan eld To overlay animage on locations other than the reference detectortransformation between a time domain and a spatial do-main is required

A selected image analysis algorithm (Kato amp Billing-hurst 1999) described in the tracking principal subsec-tion 232 is used to estimate the position and orienta-tion of the scanner with respect to the square planeOur goal in this experiment is to determine the perfor-mance of this image-analysis tracking algorithm withinour hardware system Beside from the fact that we builtonly one dual-detector system it is feasible with ourrepeatable target mover (robot arm) that we can ef-ciently conduct this experiment by moving a dual-detec-tor system to four predetermined positions to form fourvertices of the virtual square plane 60 3 60 mm in size

The experiment is set up to determine position andorientation (six DOF) of the square plane The center ofthis square is the origin of the Target coordinate systemThe six-DOF information is in a form of a transforma-tion matrix TTarget2VRD (The diagram of this experi-ment is illustrated in gure 9) We also need to takeinto account that there is a position offset between thereference detector and a tool center point (TCP) of therobot arm it is dened as Tbias To simplify the diagramwe do not include Tbias in the diagram

Figure 9 Transformation matrix diagram

14 PRESENCE VOLUME 12 NUMBER 1

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 15: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

The relative position and orientation of the square tothe robot arm coordinate is reported by a robot armsystem and dened in a transformation matrixTTarget2Robot Another transformation matrix isTRobot2VRD which related the robot arm coordinate tothe VRD coordinate This matrix has to be obtainedthrough a calibration process because the orientation ofthe VRD scan is difcult to measure directly During acalibration process the diagram in gure 9 is still validwith the exception that TTarget2VRD is now reverselyused to estimate TRobot2VRD Obviously a TRobot2VRD

estimate contains error from the tracking algorithmThus we conducted the calibration process over multi-ple positions and orientations throughout the scan eldto average out the error inicted by TTarget2VRD Thenal position and orientation of the calibratedTRobot2VRD have high correlation with our manual mea-surements Then the correction six-DOF or ldquotruthrdquotransformation matrix is determined from a transforma-tion matrix Ttruth which is a multiplication of two pre-viously mentioned transformation matrices TTarget2Robot

and TRobot2VRD The position and orientation obtainedfrom TTarget2VRD is compared to truth to determine thetracking accuracy

The virtual square is maintained at 60 3 60 mm andis tracked in six DOF within the eld of view of theVRD (185 3 6 deg) The experiment was conductedin the z axis range from 30ndash35 in away from the VRDscanner The error is dened as the difference betweenthe truth and the estimation pose and reported in de-grees and millimeters for the orientation and translationrespectively In translation the error in horizontal andvertical axes of the scanner is approximately 1 mm Amore signicant error is found along the z axis produc-ing an error of less than 12 mm within the FOV at30ndash35 in from the scanner In orientation the error ineach of (pitch yaw and roll) is within 1 deg

Overall results demonstrate that this image-analysistracking technique performs well A signicant problemof this technique is that the farther away from the scan-ner the virtual square is the projected vertices on the2D plane become increasingly susceptible to the errorintroduced during the detection process This error isoften referred to as a tracking resolution If a captured

video image is used for the tracking purposes its track-ing resolution is limited by the image resolution It isimportant to realize that our tracking resolution is notlimited by the VRD display resolution but instead bythe stability of our detection system discussed in subsec-tion 41 To further illustrate the point the reportederror in the original work (Kato amp Billinghurst 1999) isgreater than the error reported in this experimentHowever this image-analysis technique still providesour current system an alternative solution to our initialtriangulation problem A noticeable translation error inthe z axis is mainly because our current experimentalvolume is relatively farther away in that direction thanare other axes in the VRD coordinate In other words ifthe square is to be as far away in the horizontal axis as itis now in the z direction the same error quantity can beexpected

43 Dynamic Accuracy

Figure 10 demonstrates that lag can be measuredduring dynamic tracking of the robot arm system Thedual-detector system on the robot arm was moved fromone location to another horizontally and vertically asshown in the rst and second columns respectively Therobot arm moved from detection circuit position (a) tothe position (c) generating a horizontal shift due to thesystem lag as shown in (b) Moving the robot arm upfrom (d) to (f) vertically generates a vertical shift of thecrosshair below the detectors as shown in (e) The ve-locities of the detectors at the photographed target mo-tions in (b) and (e) were both approximately 100 mmsec (as determined from the robot arm readouts)

To measure the overall system lag the images at thepositions (b) and (e) were magnied to quantify thespatial or dynamic error The 15 mm error along bothaxes and the known velocities of the robot arm indicatethat the overall system lag is approximately within oneframe or 1667 ms which is from the display system

Because the detected pixel and line numbers in thecurrent frame are displayed in the following displayframe an inevitable display lag of up to 1667 ms oc-curs This is conrmed by the experimental result Fur-thermore as expected because the computational ex-

Chinthammit et al 15

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 16: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

pense for the detection is minimal its contribution tooverall system lag is negligible In the future the com-putational time can be further reduced using a real timesystem and a faster computer when multiple dual-de-tector systems are used Furthermore a system lag of1667 ms suggests that prediction algorithms andorhigher display refresh rates are to reduce latency

44 Future Work

We continue to develop the proposed shared-aper-ture tracking display into a complete head-tracking systemwith six DOF In subsection 422 we demonstrate thatsix DOF of the VRD can be estimated by an image-analy-sis technique A noticeable error is also reported in theexperiment Therefore we continue to develop the neces-sary algorithms to achieve a higher tracking accuracy

The proof-of-concept system does not have a real-time capability and therefore results in tracking errorCurrently we are considering alternative hardware solu-tions to upgrade the current system into the real-time

system Furthermore the effect of the electromechanicaljitter is not well studied A mathematical error model ofthe jitter can be used to improve our tracking perfor-mance (Holloway 1997) but we rst need to quantifythe error and then model it

In this paper the reticle generator subsystem is lim-ited to two DOF Therefore a user cannot fully experi-ence a sense of reality with augmented images Six DOFis required Currently we have been developing a six-DOF display subsystem that can communicate with thetracking system to display images with six DOF

Lastly the update rate of the shared-aperture systemcannot exceed the VRD refresh rate of 60 Hz In mili-tary cockpit applications rapid head movements oftenoccur So an optical tracking system alone is not idealfor these applications To solve this problem we areconcurrently developing a hybrid tracking system basedon the shared-aperture optical tracking system presentedin this paper and inertial sensors The goal is to increasethe measurement of head position to rates of 1000 Hzwhile keeping the VRD update rate at only 60 Hz

Figure 10 Dynamic error

16 PRESENCE VOLUME 12 NUMBER 1

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 17: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

5 Conclusions

The SARSDT is a VRD that has been modied toprovide a tracking function using coaxially the opticalpath of the VRD In this way the proposed trackingtechnology shares the same aperture or scanned opticalbeam with the visual display

The tracking precision is limited only by the stabilityof the detection circuit Currently the stabilities are12 005 and 12 001 deg in the horizontal andvertical axis respectively Ofine analysis demonstratesthat the horizontal stability can be reduced to 12

002 deg if the partial elimination of the scanner jittereffect and ideal acquisition time is manually and prop-erly imposed Implementation of peak-detection tech-niques can also improve the tracking precision The redcrosshair pattern from the VRD was superimposed ontothe reference detector to demonstrate tracking in thetime domain The static error was measured to be 00812004 and 003 12002 deg for the horizontaland vertical axes respectively The error was determinedto be due to a calibration error in the display system andthe sinusoidal scan motion in the horizontal axis

One advantage of the current tracking system in thetime domain is that the technique can be operated in a3D space without having to determine the absolute po-sitions Therefore the computational efciency is ex-tremely high Consequently this technique is effectivefor some AR applications such as ldquoSticky Fingerrdquo(Pierce et al 1997) and ldquoinformation cockpitsrdquo (Foxlinamp Harrington 2000) in which optical detectors or re-ectors can be attached to the object or environmentThe dynamic error is mainly due to the display lag cur-rently set at a 60 Hz refresh rate The experiment sup-portively indicates the system lag to be within one dis-play frame (1667 ms) (See gure 10)

Tracking in the time domain allows the AR system tosuperimpose an image only at the detector locationTracking in the spatial domain is a solution to this prob-lem and is therefore required In Chinthammit et al(2002) the result demonstrates that difculties are in-herent in transforming coordinates in the time domainto coordinates in the spatial domain Problems withtransformation are primarily due to the orientation off-

set between the IR scanned eld and the world space in3D space The solution to this problem is to determinea six-DOF transformation between the robot arm coor-dinates and the VRD coordinates Therefore we haveimplemented an image-analysis tracking technique onour current optical tracking system to determine thesix-DOF target movement The results indicate a feasi-bility of such technique for other applications such ashead tracking however more extensive development isrequired to achieve a very high tracking accuracy

Acknowledgments

We would like to thank Reinhold Behringer (Rockwell Scien-tic Company) for the robot arm software interface and Rob-ert A Burstein for the 2D reticle generator I thank Chii-Chang Gau Michal Lahav Quinn Smithwick Chris Brownand Dr Don E Parker for valuable contributions in the prep-aration of this paper The Ofce of Naval Research funded thisresearch with awards N00014-00-1-0618 and N00014-96-2-0008

References

Azuma R (1997) A survey of augmented reality PresenceTeleoperators and Virtual Environments 6(4) 355ndash385

Azuma R amp Ward M (1991) Space-resection by collinearityMathematics behind the optical ceiling head-tracker UNCChapel Hill Department of Computer Science Tech Rep TR91ndash048

Chinthammit W Burstein R Seibel E J amp Furness T A(2001) Head tracking using the virtual retinal display TheSecond IEEE and ACM International Symposium on Aug-mented Reality Unpublished demonstration available online at (httpwwwhitlwashingtoneduresearchivrdmovieISAR01_demomov)

Chinthammit W Seibel E J amp Furness T A (2002)Unique shared-aperture display with head or target track-ing Proc of IEEE VR2002 235ndash242

Foxlin E amp Harrington M (2000) WearTrack A self-referenced head and hand tracker for wearable computersand portable VR The Fourth Intl Symp on Wearable Com-puters 155ndash162

Holloway R (1997) Registration error analysis for aug-

Chinthammit et al 17

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1

Page 18: A Shared-Aperture Tracking - Columbia Universitydrexel/CandExam/Chinthammit2003_VRD.pdf · Display for Augmented Reality Abstract The operation and performance of a six degree-of-freedom

mented reality Presence Teleoperators and Virtual Environ-ments 6(4) 413ndash 432

Janin A Zikan K Mizell D Banner M amp Sowizral H(1994) A videometric head tracker for augmented realityapplications SPIE Telemanipulator and Telepresence Tech-nologies 2351 308 ndash315

Johnston S amp Willey R (1995) Development of a com-mercial retinal scanning display Proceedings of the SPIEHelmet- and Head-Mounted Display and Symbology DesignRequirements II 2465 2ndash13

Kato H amp Billinghurst M (1999) Marker tracking andHMD calibration for a video-based augmented reality con-ferencing system Proc of the 2nd IEEE and ACM Interna-tional Workshop on Augmented Reality 85ndash94

Kelly J Turner S Pryor H Viirre E Seibel E J amp Fur-ness T A (2001) Vision with a scanning laser displayComparison of icker sensitivity to a CRT Displays 22(5)169ndash175

Kutulakos K amp Vallino J (1998) Calibration-free aug-mented reality IEEE Trans Visualization and ComputerGraphics 4(1) 1ndash20

Pierce J Forsberg A Conway M Hong S Zeleznik Ramp Mine M (1997) Image plane interaction techniques in3D immersive environments Symposium on Interactive 3DGraphics 39ndash44

Rolland J Davis L amp Baillot Y (2001) A survey of track-ing technology for virtual environments In W Bareld ampT Caudell (Eds) Fundamentals of Wearable Computers

and Augmented Reality (pp 67ndash112) Mahwah NJ Law-rence Erlbaum

Sorensen B Donath M Yang G amp Starr R (1989) TheMinnesota scanner A prototype sensor for three-dimen-

sional tracking of moving body segments IEEE Transac-tions on Robotics and Automation 5(4) 499 ndash509

Stetten G Chib V Hildebrand D amp Bursee J (2001)Real time tomographic reection Phantoms for calibration

and biopsy Proceedings of the IEEE and ACM InternationalSymposium on Augmented Reality 11ndash19

Yokokohji Y Sugawara Y amp Yoskikawa T (2000) Accu-rate image overlay on video see-through HMDs using vi-

sion and accelerometers Proc of IEEE VR2000 247ndash254You S amp Neumann U (2001) Fusion of vision and gyro

tracking for robust augmented reality registration ProcIEEE VR2001 71ndash78

You S Neumann U amp Azuma R (1999) Hybrid inertialand vision tracking for augmented reality registration Pro-ceedings of IEEE VR 999 260ndash267

Wine D Helsel M Jenkins L Urey H amp Osborn T

(2000) Performance of a biaxial MEMS-based scanner formicrodisplay applications Proceedings of SPIE 4178 186ndash196

Zeleznik R LaViola Jr J Feliz D amp Keefe D (2002)

Pop through button devices for VE navigation and interac-tion Proceedings of the IEEE Virtual Reality 2002 127ndash134

18 PRESENCE VOLUME 12 NUMBER 1