9
IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 39, NO. 1, JANUARY/FEBRUARY 2003 21 Stereo Vision in LHD Automation Mark Whitehorn, Student Member, IEEE, Tyrone Vincent, Member, IEEE, Christian H. Debrunner, Member, IEEE, and John Steele, Member, IEEE Abstract—This paper details work in applying stereo vision for the enhancement of safety and productivity in the operation of a load–haul–dump (LHD) vehicle in underground mining. The pri- mary goal of this portion of the research is to provide three–di- mensional (3-D) models of the LHD’s environment. Availability of these models facilitates performance of automated or teleoperated loading tasks and enhances safety through identification and loca- tion of humans in the path of the vehicle. Generation of an accurate 3-D model of the immediate surroundings of the LHD is accom- plished through processing of stereo visual imagery. Stereo video is acquired using a pair of digital cameras mounted above the cab of the LHD. The video data are processed into a dense depth map plus confidence information. These depths and the stereo rig cal- ibration data are then used to construct a 3-D surface model. We demonstrate useful models obtained under both well-illuminated and low-light conditions. Index Terms—Machine vision, mining, modeling, stereo. I. INTRODUCTION B Y ITS NATURE, mining involves heavy equipment and large forces. This type of environment is not conducive to the safety and health of those who do the work unless they are removed from the point of direct application of these forces. This project is focused on loading automation for load–haul–dump vehicles (LHDs). While several automation demonstration projects are under development, production LHDs are all manually operated at the time of this writing, with the exception of Inco mines in Sudbury, ON, Canada, and LKAB’s 1 Kiruna Mine in Sweden where the tramming and dumping tasks have been automated and the loading operation is done via remote control (see Section IV for more detail on the current state of the art). The application of advanced technology to LHDs has thus far been limited to the haul and dump tasks. Loading has proven the most difficult to automate due to the fact that it requires perception of the shape of the muckpile in order to plan the efficient and safe removal of each scoop. Navigation in the vicinity of the muckpile during the loading operation is also complicated by the dynamic nature of the environment. A three–dimensional (3-D) model of the muckpile provides the necessary information for automation of the loading operation, and better quality information for remote operators. Paper PID 02–43, presented at the 2001 Industry Applications Society Annual Meeting, Chicago, IL, September 30–October 5, and approved for publication in the IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS by the Mining Industry Committee of the IEEE Industry Applications Society. Manuscript submitted for review October 15, 2001 and released for publication October 22, 2002. The authors are with the Colorado School of Mines, Golden, CO 80401-1843 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TIA.2002.807245 1 Luossavaara-Kiirunavaara Aktiebolag (LKAB) is a mining company owned by the Swedish State. TABLE I FATAL ACCIDENTS UNDERGROUND INVOLVING POWER HAULAGE Our current effort is just one example of the application of advanced sensing to move a miner further from harm’s way and allow her/him to operate and manage mining equipment from a healthier, less stressful environment. For example, we believe stereo vision sensing and its fusion with other sensory data will allow us to move the coal miner away from the long-wall shear to a location that is both safer and healthier while simultane- ously improving control over the long-wall operation. II. PROBLEM STATEMENT The goal of this project is to improve the health and safety of underground miners. The approach taken is to move the opera- tors of LHDs to remote locations, away from the vehicle where they can telemanage the operation of the LHD. This has the fol- lowing benefits: • reducing risk of accident; • reducing exposure to hydrocarbon particulates; • reducing exposure to repetitive shock loading. This project is focused on developing the stereo vision and 3-D modeling techniques required to build models of the under- ground to enable automation of the LHD loading operation. A 3-D model of the muckpile (and surrounding area) is necessary for planning the location of each scoop, monitoring the slope of the face and obstacle avoidance. A broader goal is to apply this technology to other mining situations and environments, and to understand the requirements of implementing this technology in a number of mining venues. III. HEALTH AND SAFETY MOTIVATION Operation of power haulage equipment is a major source of fatal injuries in the mining industry. Table I shows the number of fatal accidents for each of the years 1995–2000 that were asso- ciated with underground power haulage. Fourteen miners have been killed since 1995. These numbers include only metal/non- metal underground operations. If we include the numbers for underground coal involving power haulage, we would see a sig- nificant increase. The systems being developed for this project will be applicable to all of these situations. If the equipment can be automated such that the miner is moved to a remote location, both his/her safety and health will be improved. In addition, many current implementations of remote control do not pro- vide sufficient information for the operator to make decisions 0093-9994/03$17.00 © 2003 IEEE

Stereo vision in lhd automation - Industry Applications, IEEE

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 39, NO. 1, JANUARY/FEBRUARY 2003 21

Stereo Vision in LHD AutomationMark Whitehorn, Student Member, IEEE, Tyrone Vincent, Member, IEEE, Christian H. Debrunner, Member, IEEE,

and John Steele, Member, IEEE

Abstract—This paper details work in applying stereo vision forthe enhancement of safety and productivity in the operation of aload–haul–dump (LHD) vehicle in underground mining. The pri-mary goal of this portion of the research is to provide three–di-mensional (3-D) models of the LHD’s environment. Availability ofthese models facilitates performance of automated or teleoperatedloading tasks and enhances safety through identification and loca-tion of humans in the path of the vehicle. Generation of an accurate3-D model of the immediate surroundings of the LHD is accom-plished through processing of stereo visual imagery. Stereo videois acquired using a pair of digital cameras mounted above the cabof the LHD. The video data are processed into a dense depth mapplus confidence information. These depths and the stereo rig cal-ibration data are then used to construct a 3-D surface model. Wedemonstrate useful models obtained under both well-illuminatedand low-light conditions.

Index Terms—Machine vision, mining, modeling, stereo.

I. INTRODUCTION

B Y ITS NATURE, mining involves heavy equipment andlarge forces. This type of environment is not conducive

to the safety and health of those who do the work unlessthey are removed from the point of direct application of theseforces. This project is focused on loading automation forload–haul–dump vehicles (LHDs). While several automationdemonstration projects are under development, productionLHDs are all manually operated at the time of this writing,with the exception of Inco mines in Sudbury, ON, Canada, andLKAB’s 1 Kiruna Mine in Sweden where the tramming anddumping tasks have been automated and the loading operationis done via remote control (see Section IV for more detailon the current state of the art). The application of advancedtechnology to LHDs has thus far been limited to the haul anddump tasks. Loading has proven the most difficult to automatedue to the fact that it requires perception of the shape of themuckpile in order to plan the efficient and safe removal of eachscoop. Navigation in the vicinity of the muckpile during theloading operation is also complicated by the dynamic natureof the environment. A three–dimensional (3-D) model of themuckpile provides the necessary information for automation ofthe loading operation, and better quality information for remoteoperators.

Paper PID 02–43, presented at the 2001 Industry Applications Society AnnualMeeting, Chicago, IL, September 30–October 5, and approved for publication inthe IEEE TRANSACTIONS ONINDUSTRY APPLICATIONSby the Mining IndustryCommittee of the IEEE Industry Applications Society. Manuscript submittedfor review October 15, 2001 and released for publication October 22, 2002.

The authors are with the Colorado School of Mines, Golden, CO 80401-1843USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TIA.2002.807245

1Luossavaara-Kiirunavaara Aktiebolag (LKAB) is a mining company ownedby the Swedish State.

TABLE IFATAL ACCIDENTSUNDERGROUNDINVOLVING POWER HAULAGE

Our current effort is just one example of the application ofadvanced sensing to move a miner further from harm’s way andallow her/him to operate and manage mining equipment froma healthier, less stressful environment. For example, we believestereo vision sensing and its fusion with other sensory data willallow us to move the coal miner away from the long-wall shearto a location that is both safer and healthier while simultane-ously improving control over the long-wall operation.

II. PROBLEM STATEMENT

The goal of this project is to improve the health and safety ofunderground miners. The approach taken is to move the opera-tors of LHDs to remote locations, away from the vehicle wherethey can telemanage the operation of the LHD. This has the fol-lowing benefits:

• reducing risk of accident;• reducing exposure to hydrocarbon particulates;• reducing exposure to repetitive shock loading.

This project is focused on developing the stereo vision and 3-Dmodeling techniques required to build models of the under-ground to enable automation of the LHD loading operation. A3-D model of the muckpile (and surrounding area) is necessaryfor planning the location of each scoop, monitoring the slope ofthe face and obstacle avoidance. A broader goal is to apply thistechnology to other mining situations and environments, and tounderstand the requirements of implementing this technologyin a number of mining venues.

III. H EALTH AND SAFETY MOTIVATION

Operation of power haulage equipment is a major source offatal injuries in the mining industry. Table I shows the number offatal accidents for each of the years 1995–2000 that were asso-ciated with underground power haulage. Fourteen miners havebeen killed since 1995. These numbers include only metal/non-metal underground operations. If we include the numbers forunderground coal involving power haulage, we would see a sig-nificant increase. The systems being developed for this projectwill be applicable to all of these situations. If the equipment canbe automated such that the miner is moved to a remote location,both his/her safety and health will be improved. In addition,many current implementations of remote control do not pro-vide sufficient information for the operator to make decisions

0093-9994/03$17.00 © 2003 IEEE

22 IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 39, NO. 1, JANUARY/FEBRUARY 2003

without significant stress. By developing better methods of pre-senting information to the miner, we intend to help him/her doa more effective job and reduce the work stress and temptationto travel into unsafe areas in order to gather more informationabout a mining operation.

As reported in [1], visibility is a major problem for LHD op-erators. He reports that approximately 160 accidents/incidentsinvolving underground LHD equipment in the providence ofOntario, Canada. From 1986–1996 in Ontario, there have beenten fatal accidents involving LHDs, where either the operatoror a pedestrian struck by the vehicle was killed. Of the tenfatalities, five of them involved lack of good visibility on thepart of the operator. The ability of vision sensors to detectpeople in the work area will also enable automatic safetymeasures to reduce the risk of this type of accident.

IV. L ITERATURE SURVEY

The focus of this research is to develop methods for stereovision sensing such that it can be used to build 3-D models ofunderground mining operations. Such systems will be invaluablein developing automated mining systems, and thus movingminers from hazardous environments to remote operationallocations that will be both safer and healthier. Thus, theliterature survey covers both automation of underground miningequipment as well as stereo vision applied to mining andconstruction.

Chadwick [2] has reported on development of new automatedLHD systems at Sandvik Tamrock’s test mine in Tampere, Fin-land, and at LKAB’s Kiruna iron ore mine in Sweden. At theKiruna mine, LHDs are now being operated remotely and thetramming and dumping operations are done automatically. Theautomated tramming uses reflectors that are suspended fromthe sides of the drifts so that the LHD can follow the desiredpath. In the system being developed at Sandvik Tamrock’s Tam-pere Mine, the navigation is done using two–dimensional (2-D)models of the drift wall profile and dead reckoning. The wallprofile is detected using two laser scanners, one on the front andone on the rear of the LHD. This system will be able to automatethe bucket operation for a single load and allow for automatedtramming of the vehicle. Note that the task of load planning andmultiple loading sequences in not part of this system. A moredetailed discussion of the state of the art in underground miningis given in [3].

Chadwick [4] has also reported on the ongoing developmentof autonomous LHDs in Australia by Caterpillar Elphinstone.This project is an integration of work done at CSIRO [5] inautomated guidance using laser scanners, and AutoDig [6]an automated bucket loading system for front-end loadersand LHDs. The AutoDig system automates only the scoopingoperation; it does not sense the shape of the muckpile, andrequires a human operator to control the approach for eachscoop. Caterpillar is integrating these two subsystems into twoElphinstone LHDs (an R1600 at WMC Resources’ Leinsternickel mine and a larger R2900 at WMC’s Olympic Dam,also in South Australia).

Stereo has been used in computer vision for many years,and is well summarized in [7, Ch. 6]. More recently, work onmapping the environment from multiple views (from stereoor motion) has focused on achieving high accuracy overlong image sequences using a batch process [8]–[10]. Otherapproaches [11]–[14] compute the structure from stereo andmotion simultaneously (generally also as a batch process).These batch approaches are well suited to applications suchas video compression or mixing of synthetic and real imagery(e.g., in the movie and television industries), but do not directlytake advantage of the use of stereo sensing and due to theircomputational costs and batch nature, they are not well suitedto navigation applications. In navigation, the more commonapproach is typified by Thrun, Burgard, and Fox [15], in which3-D structure is extracted from range data and the resultsare integrated over time. Another class of approaches usesincremental methods such as a Kalman filter to update the3-D model over time [16]–[18], but these approaches havegenerally been applied to monocular sequences rather thanstereo. In principal, however, they could apply to stereo as wellas monocular sequences.

V. 3-D MODELING FORREMOTE CONTROL AND AUTOMATION

OF LHDS

We have chosen stereo imaging techniques to build accurate3-D surface representations of an underground mine environ-ment. We show that even in low-light situations, useful 3-Dmodels can be constructed from visual imagery obtained withcolor charge-coupled device (CCD) cameras. The methodsdeveloped here are different from approaches taken in otherLHD automation research, i.e., we are using standard digitalvideo cameras to develop 3-D information; others have usedlaser range finders to develop 2-D models. Compared to 2-Drangefinders, stereo imaging provides much more informationper unit time; the rangefinder must scan a single beam over thescene while cameras image the entire field of view at videoframe rates of up to several hundred Hertz. Our prototypesystem can collect stereo imagery at up to 15 frames/s at640 480 pixels. This speed and resolution should supportefficient planning and control of the LHD loading operation.

As demonstrated in [7], correlation-based stereo can be usedto obtain dense depth maps at high speed and high resolutionusing relatively simple algorithms and hardware. Although cur-rent algorithm development is proceeding in Java to facilitaterapid implementation, we anticipate real-time implementationof our algorithms for practical application in vehicle control.High-speed microprocessors, digital signal processor (DSP)chips and reconfigurable field-programmable-gate-array(FPGA)-based processors today make such a system practicaland inexpensive. For efficiency, we exploit several constraintsimposed by the stereo rig and assumptions on scene geometry.The epipolar constraint is used to reduce the correlation searchto one dimension. The result of region matching is adisparityimagefrom which the depth of scene points may be calculated.We also demonstrate good performance in detecting outliersin the disparity image and rejecting the resulting invalid depthmeasurements.

WHITEHORN et al.: STEREO VISION IN LHD AUTOMATION 23

Fig. 1. Stereo vision system on LHD at CSM’s Edgar Experimental Mine.

A. Vision System Implementation

We have modified a Firewire camera device driver from theMicrosoft Platform Software Development Kit (SDK) to sup-port two Sony DFW-VL500 cameras with remote focus, zoom,and iris controls. The data acquisition system runs under Win-dows 2000 on a notebook PC. For synchronized collection ofleft and right images the device driver was also modified to putthe cameras into external trigger mode. All test imagery was ac-quired from an LHD operating in CSM’s Edgar ExperimentalMine in Idaho Springs, CO (Fig. 1). The cameras are mountedabove the cab of the LHD, and angled inward and downwardwith field of view set to 40 (2.25 m horizontal at a distance of3 m).

1) Stereo Rig Calibration:A calibrated stereo rig has accu-rately specified intrinsic and relative extrinsic parameters forboth cameras. Our intrinsic camera parameters specify a pin-hole camera model with radial distortion. The pinhole model ischaracterized by its focal length, image center, and pixel spacingin two dimensions. The radial distortion is characterized by asingle parameter. The extrinsic parameters describe the positionand orientation of each camera as a six-degree-of-freedom poserelative to the left camera coordinate system. Intrinsic parame-ters for a given camera are constant, assuming the physical pa-rameters of the optics do not change over time and, thus, maybe precalculated. Extrinsic parameters depend on the relativecamera poses and will be constant if the cameras are fixed rel-ative to one another. We use the Tsai algorithm [19], [20] and anonplanar calibration target with concentric contrasting circles(CCCs) [21], [22] for offline calibration of the stereo rig.

2) Image Preprocessing:Especially in low-light conditions,intensity noise is observable in the stereo images. This noise cancause errors in the correlation process. To reduce the effect ofthis noise, both low-pass and median filters may be applied tothe intensity values prior to further processing. A 55 Gaussianfilter [23] is used for the low-pass operation. A 55 squaremask median filter also seems to be effective at improving thecharacter of the correlation curves discussed below.

In order to make application of the epipolar constraint effi-cient, the left and right images arerectifiedby reprojecting themboth to a common plane. Once the stereo rig has been calibrated,it is possible to reduce the complexity of the stereo correla-

tion task by rectifying the pair of images. This process projectsboth images to a common image plane; this is equivalent to astereo rig geometry in which both cameras have the same orien-tation. With no rotation between camera coordinate frames, theepipolar plane formed by the two camera centers and an objectpoint intersects each image plane in a horizontal line. This guar-antees that corresponding image points lie in the same pixel rowof each image and simplifies the correlation search.

3) Measuring Disparity: We use a correlation-based algo-rithm [7] for matching a given pixel in the left image with itscorresponding pixel in the right image. Each point correspon-dence between the two views of the scene together with the cal-ibration parameters allows computation of the range to a scenepoint. After rectification, disparity is simply the difference inimage plane coordinates of corresponding left and right pixels(see [24, Sec. 6.2.3.1]), and the depth of an image feature is adirect function of the disparity

(1)

where is the disparity, , and , are the pixelcoordinates in the left and right images,is the stereo rigbaseline (distance between camera projection centers),isthe focal length, and is the depth to the scene for this pixel.Rather than compare individual pixels between images, weuse a correlation window of fixed size to compare regionsof pixels in each image. Reference [7] defines six differentcorrelation measures; we have implemented all of them inorder to compare their performance on our imagery. Eachcorrelation measures provides a distance metric relatingwindows in the left and right images and is based on one of:normalized cross-correlation, sum of absolute differences orsum of squared differences. For example, the zero-mean sumof squared differences (ZSSD) measure is defined as

(2)

where ( , ) is the location of a pixel on the image plane, andis the disparity. The image intensitiesresult from subtractingthe mean intensity over an region from each pixel in theimage

(3)

where is the number of pixels in the window. Fig. 2 shows theZSSD measure applied to a typical muckpile image (the overlaidwhite boxes and the correlation windows are 1515 pixels).The correlation window is centered on a particular pixel in theleft image and slides along the corresponding epipolar line inthe right image. To further reduce the amount of computation,only the portion of the epipolar line that corresponds to desiredrange values is examined. For each position of the window alongthis portion of the epipolar line, a value for the correlation mea-sure is computed. This results in a graph of the correlation as afunction of range. Once this curve has been obtained, the peakis interpolated to subpixel resolution by fitting a quadratic. Thepeak gives the location of the matching pixel in the right image

24 IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 39, NO. 1, JANUARY/FEBRUARY 2003

Fig. 2. ZSSD correlation measure superimposed on left and right images.

Fig. 3. Disparity image: 15� 15 correlation window.

to subpixel precision and is used in (1) to calculate the depth ofthe scene.

The disparity image computed from the stereo pair of Fig. 2is shown in Fig. 3.

The probability of a correlation mismatch decreases as thesize of the correlation window and the amount of textureincrease, but the achievable range accuracy is less for largerwindows due to averaging effects [25]. We find that, for themuckpile imagery, we achieve high disparity density and lowmismatch percentages with . This corresponds to acorrelation window size of about 5 cm5 cm at 3 m; depthvariations over spatial regions smaller than this will be blurred.Multiresolution techniques can improve depth resolution bynarrowing the disparity search range and applying a smallercorrelation window to refine the disparity measurement.

B. 3-D Modeling and Visualization

We next build 3-D surface models using the disparitiesobtained from correlation. The disparity image is first filteredto remove outliers using the disparity gradient constraint [24]and (optionally) erosion. In regions of the scene which are ap-proximately perpendicular to the line of sight, correlation-basedstereo generates dense, reliable disparities. Disparity values insparsely populated regions of the disparity image are not asreliable and erosion tends to remove these values. The erosionoperation is implemented by examining the eight nearest

neighbors of a pixel. If any one of these depth values is invalid,the depth measurement at this pixel is also labeled invalid.

The depth of each object point is computed using (1) andthe corresponding 3-D scene point is back-projected usingthe stereo rig calibration data. For visualization, Delaunaytriangulation is then used to generate a triangular mesh surfacemodel. These triangles are further filtered to eliminate thosewith large depth gradients. The final step, for quality controland presentation purposes, is texture mapped 3-D visualizationof the surface. This visualization of the surface provides aqualitative indication of the accuracy and utility of the stereorange data.

We use the Java 3-D extension to Java 2 for 3-D visualization.The data object used to represent our 3-D surface is an instanceof the Shape 3-D class. This object contains the set of trian-gles obtained via Delaunay triangulation from the depth image,along with a set of normals to the triangles. The surface nor-mals are used in rendering an image of the surface to modelthe reflection of light toward the camera. In order to aid in vi-sualization and verify the correctness of the surface model, theimage from the left camera may also be applied as a texture mapto assign intensity values to the surface. The left image is usedbecause we construct the surface model in the left camera coor-dinate frame. For each triangle vertex in the surface model wespecify a corresponding pixel coordinate in the left image. Thisallows the Java 3-D renderer to accurately estimate the imageintensity corresponding to each planar element of the surfacewhen rendering a specific view. The renderer may be set to pro-vide a blend of the texture map intensity and the surface modelappearance for purposes of visualization. This allows viewingthe orientation of the surface in addition to the reference image.

C. 3-D Model Results

Visualizations of the resulting surface models are shown inFigs. 4 and 5. The texture-mapped 3-D model built from thestereo pair of Fig. 2 is displayed from three different perspec-tives left, right, and center. Each triangle vertex in the surfacemesh represents the average of all 3-D points within an,bin which is 25 mm square. This averaging is performed mainlyto reduce the runtime of the Delaunay triangulation; a typicalstereo pair provides about 100 K 3-D scene points with,spacing on the order of 3 mm. The averaging reduces the numberof 3-D points by a factor of about 64. Depth precision is typically

WHITEHORN et al.: STEREO VISION IN LHD AUTOMATION 25

Fig. 4. Surface models: left view and right view.

Fig. 5. Surface models: center view (left) and textured close-up (right).

Fig. 6. Portion of 3-D point cloud.

about 5 mm. A small region of the 3-D point cloud is shown inFig. 6; each point is represented by a box which is 1 mm onall sides. The density and quality of the 3-D point cloud is muchgreater than necessary for the construction of this low-resolutionsurface model; we can efficiently (in future work) utilize all ofthe 3-D data for temporal integration and precise navigation.

After performing triangulation, we cull triangles which spanmore than five bins in or . The effect of culling trianglesthat span regions of missing data is apparent in the gaps (blackregions) in the surface. Individual triangles are clearly visible

in the surface, and the corners provide an indication of the av-erage value of depth over the 2525 mm bins while the trianglefaces provide an indication of surface slope. Note that the slopeof the face (of the muckpile) is readily determined from the sideviews of the model, and that there is much surface detail. Thismodel was computed from a single stereo pair, and contains nouseful information near depth discontinuities or in regions thatare not visible to both cameras. The lower half of Fig. 5 is acloser view of a large rock in the foreground near the bottom ofthe pile. This surface mesh has had the left camera image tex-ture mapped onto it to show that the surface shape correspondscorrectly to the image of the rock. Here the black regions corre-spond mainly to portions of the rock which were parallel to one(or both) camera’s the line of sight and, therefore, not visible inboth images of the stereo pair.

Integration of data from multiple viewpoints supplies addi-tional information, providing additional detail in the surfacemodel and aiding in the elimination of erroneous data. Algo-rithms for registering surface models computed from multipleviews, and for estimating the vehicle motion between succes-sive viewpoints are available and will be integrated with the 3-Dmodeling software to accomplish this. Section V-D describespreliminary work on temporal integration; future work will alsoinclude temporal integration into full 3-D volumetric models.

Fig. 7 (left) shows the left camera’s view of a person standingin front of the LHD at a distance of about 4 m, with only the LHD

26 IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 39, NO. 1, JANUARY/FEBRUARY 2003

Fig. 7. Low-light image of person at 4 m (left) and detected 3-D surface ofperson (right).

headlights for illumination. The resulting 3-D model (bottom ofFig. 7) provides information for only the regions with adequatelighting, and demonstrates high depth resolution in low light.The 3-D surface model of the person spans a depth range of ap-proximately 35 cm; only the frontal view of the model is shown.This model demonstrates that the system is capable of clearlysensing the presence of a human at a distance of 4 m, and thiscapability should be usable to enhance safety for miners.

The results obtained from these first data collections inthe Edgar Mine are quite encouraging. While the color CCDcameras we used are by no means the most sensitive camerasavailable, the system performs quite well at low illuminationlevels. Also, the wide baseline stereo configuration providesgood depth resolution and works effectively at the nominal 3 mdistance for mapping the muckpile. It is clear that a narrowerbaseline would be useful for working at shorter distances, sincethe performance of the wide baseline rig degrades for objectscloser than about 2 m due to the fact that the two cameras seevery different aspects of the nearby object.

D. Integrating 3-D Data Into Digital Elevation Models

The result of processing stereo video is estimates of the 3-Dlocation of the surface of objects in the scene. This informationis further processed in order to create a digital elevation model(DEM) of the operating area and to determine the location of theLHD relative to this map. This map can be used in the user in-terface to give a realistic representation of the operating area ofthe LHD, as well as in lower level task planning for autonomousmodes. Map building for navigation has been a central problemin mobile robot research (e.g., [26]–[30]), while building 3-Dmodels from images for activities such as object recognition hasbeen an area of interest in computer vision.

A DEM uses a fixed 2-D grid with variable height to repre-sent the 3-D world. It is usually termed a 2 (1/2)D represen-tation, since only a 2-D surface living in a 3-D world can berepresented, but this is adequate for most navigation and ob-stacle avoidance applications. If necessary, a second grid canbe used to represent the ceiling topography. 3-D data can beintegrated over time by modifying the DEM as new informa-tion becomes available [27]. Currently, a uniform grid is used,but extensions to nonuniform grids can represent information atdifferent scales. 3-D location information that is produced bythe stereo algorithm contains two types of information. There is

Fig. 8. Determining empty and filled space from 3-D information.

information that an object exists at a point plus the implicationthat no object exists between the camera position and that point,since otherwise the point would be occluded. In particular, con-sider Fig. 8. A ray is drawn between a pixel in one camera’simage plane and the corresponding 3-D location (, , ) foundvia the stereo algorithm. The map grid is drawn below. Consid-ering for a moment only objects on the floor, this ray indicatesthat there is an object of height at leastat the grid locationwhere the ray terminates. The column rising from the floor il-lustrates this. This object may actually be taller, but additionaldata will determine that. There is also information that everygrid square that the ray intersects between the object and thecamera cannot contain an object of height more than. Thelighter shaded columns from the ceiling are drawn to indicatethis.

This processing of 3-D data merges information into themodel, which may already contain information from previousimages. Along with the height information is a measure of thecertainty, or quality of each point. The certainty informationmay be modified by factors such as the noise in the 3-Dmeasurement, or uncertainty in the registration process.

For further use and for compression, the data representationis modified to a function that captures both the magnitude andcertainty information. Denote the maximum height data withheights and certainties , . The characterizationfunction for the maximum height data is defined as

(4)

A similar characterization function represents the minimumheight data

(5)

The resulting function for the data in Fig. 9 is shown in Fig. 10.Note that because of the linear dependence, the function for min-imum heights was not greatly affected by the outliers. For datareduction, this function is approximated by a piecewise linear

WHITEHORN et al.: STEREO VISION IN LHD AUTOMATION 27

Fig. 9. Histogram for typical grid point.

Fig. 10. Height characterization for typical grid point.

function with a fixed number of terms, that is, a small number(about ten) of new and are calculated which approximatethe full data function. New data that are recorded simply modifythese parameters, and the raw data can be discarded.

To obtain a 3-D map, the minimum data and maximum dataare fused using the characterization functions. A simple methodis to find the height at which the characterization functions areequal. For the example of Fig. 10, this occurs at 0.45 m. Moresophisticated methods may use more knowledge about the ex-pected distribution of data due to errors, or also use data inneighboring cells. Although a single height may be chosen fordisplay, the characterization functions also clearly show the un-certainty in the height estimate as well, and this information canbe used depending on the application. The results for processingthe complete depth image of Fig. 3 are shown in Fig. 11. For ref-erence, the camera is located at (2.5, 0, 1.75) and is facing in thepositive direction.

VI. FUTURE WORK

Future work will build on the 3-D sensing and modeling ca-pabilities to:

Fig. 11. Complete depth map.

• build full 3-D models of underground mine geometry fromstereo vision;

• assist teleoperated loading operations;• enable automated loading operations.

We will be adding refinements to the surface modelingprocess. The surface model data structure is where data frommultiple image pairs is combined (temporal integration). Eachsuccessive set of 3-D point clouds can be used to refine andexpand the surface model by a process of registration and aver-aging. Registration of a new point cloud to the existing modelrelies on feature correspondences. These correspondences areguided by intensity features for efficiency, and refined usingthe 3-D data for accuracy. Motion of the LHD provides imagesfrom multiple views which expands the volume covered by themodel and may add detail in regions with sparse depth data.Application of robust filters will also improve the quality ofthe model by rejecting erroneous measurements. During thenext year, we plan to integrate the stereo imaging and 3-Dmodel building so that continuous updating of the model ispossible. We will also be working on the modeling and controlof the loader during the loading operation. In addition, wewill experiment with applying our stereo imaging to otherunderground environments, e.g., long wall.

Task level planning of the loading operation is required if weare to further automate LHDs. To our knowledge, this level ofcontrol has not yet been demonstrated.

VII. CONCLUSIONS

• Automated LHDs will move operators to a safer healthierenvironment, thus reducing the risk of injury and healthcompromising exposure.

• Sensing and modeling the 3-D shape of underground muckpiles has been demonstrated.

• Ability to detect the presence of humans in the LHDworkspace has been demonstrated.

• The stereo vision methods developed here may enablemore general and effective automation than the 2-D laserrange finders used in other approaches.

28 IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 39, NO. 1, JANUARY/FEBRUARY 2003

REFERENCES

[1] J. Tyson. (1997, Oct.) To see or not to see. . .that is the question!.[Online]. Available: http://www.uq.edu.au/eaol/oct97/tyson/tyson.html

[2] J. Chadwick, “Mine automation,”Mining Mag., vol. 183, no. 1, pp.12–16, July 2000.

[3] J. P. H. Steele, C. Debrunner, T. Vincent, and M. Whitehorn, “Roboticsfor underground hardrock mining: What have we got so far, where dowe go from here,” presented at the SME Annu. Meeting, Salt Lake City,UT, Mar. 2000.

[4] J. Chadwick, “Lhd automation in australia,”Mining Mag., vol. 183, no.1, pp. 4–5, July 2000.

[5] R. Madhavan, “Achieving reliable and robust autonomous navigationof underground vehicles, theory and experimental results,” in IndianConference on Computer Vision, Graphics and Image Processing, NewDelhi, India, December 1998.

[6] P. J. A. Lever, F. Y. Wang, and X. Shi, “Using bucket force/torque feed-back for control of an automated excavator,”Soc. Mining Metall., Ex-plor. Trans., vol. 300, pp. 135–139, 1997.

[7] O. Faugeras, B. Hotz, H. Mathieu, T. Vieville, Z. Zhang, P. Fua, E.Theron, L. M. G. Berry, J. Vuillemin, P. Bertin, and C. Proy, “Real Timecorrelation-based stereo: Algorithm, Implementations and applications.Programme 4—Robotique, image et vision, projet robotvis,” Inst. Nat.Recherche Informatique Automatique, Sophia-Antipolis, France, Tech.Rep. 2013, 1993.

[8] P. Beardsley and P. Torr, “3D model acquisition from extended imagesequences,” in European Conference on Computer Vision, 1996.

[9] A. W. Fitzgibbon and A. Zisserman, “Automatic camera recovery forclosed and open image sequences,” presented at the European Conf.Computer Vision, 1998.

[10] H. S. Sawhney and Y. Guo, “Multi-view 3D estimation and applicationto match move,” presented at the IEEE Workshop Multi-View Modelingand Analysis of Visual Scenes, 1999.

[11] G. P. Stein and A. Shashua, “Direct estimation of motion and extendedscene structure from a moving stereo rig,” presented at the ComputerVision and Pattern Recognition Conf., 1998.

[12] R. Mandelbaum, G. Salgian, and H. Sawhney, “Correlation-based esti-mation of ego-motion and structure from motion and stereo,” presentedat the Int. Conf. Computer Vision, 1999.

[13] R. Szeliski, “A multi-view approach to motion and stereo,” presented atthe Computer Vision and Pattern Recognition Conf., 1999.

[14] P. K. Ho and R. Chung, “Stereo-motion with stereo and motion incomplement,”IEEE Trans. Pattern Anal. Machine Intell., vol. 22, pp.215–220, Feb. 2000.

[15] S. Thrun, W. Burgard, and D. Fox, “A real-time algorithm for mobilerobot mapping with applications to multi-robot and 3D mapping,” inIEEE Int. Conf. Robotics and Automation, Apr. 2000.

[16] T. J. Broida and S. Chandrashekhar, “Recursive 3-D motion estimationfrom a monocular image sequence,”IEEE Trans. Aerosp. Electron. Syst.,vol. 26, pp. 639–656, July 1990.

[17] S. Soatto and P. Perona, “Reducing structure from motion: A generalframework for dynamic vision—Part 1: Modeling,”IEEE Trans. PatternAnal. Machine Intell., vol. 20, pp. 933–942, Sept. 1998.

[18] S. Soatto and P. Perona, “Reducing structure from motion: A generalframework for dynamic vision—Part 2: Implementation and experi-mental assessment,”IEEE Trans. Pattern Anal. Machine Intell., vol.20, pp. 943–960, Sept. 1998.

[19] R. Y. Tsai, “A versatile camera calibration technique for high-accuracy3D machine vision metrology using off-the-shelf TV cameras andlenses,” IEEE J. Robot. Automat., vol. RA-3, pp. 323–344, Aug.1987.

[20] R. Willson. (1995) Tsai camera calibration software. [Online]. Avail-able: http://www.cs.cmu.edu/afs/cs/usr/rgw/www/TsaiCode.html

[21] C. W. Sklair, W. A. Hoff, and L. B. Gatrell, “Accuracy of locating cir-cular features using machine vision,” inProc. SPIE—Cooperative Intel-ligent Robotics in Space II, Nov. 12–14, 1992.

[22] L. B. Gatrell, W. A. Hoff, and C. W. Sklair, “Robust image features: Con-centric contrasting circles and their image extraction,”Proc. SPIE—Co-operative Intelligent Robotics in Space II, Nov. 12–14, 1992.

[23] P. J. Burt, “Fast filter transforms for image processing,”Comput. Graph.Image Process., vol. 6, pp. 20–51, 1981.

[24] O. Faugeras,Three Dimensional Computer Vision, A Geometric View-point. Cambridge, MA: MIT Press, 1999.

[25] H. K. Nishihara, “Prism, a practical real-time imaging stereo matcher,”Massachusetts Inst. Technol., Cambridge, MA, Memo 780, 1984.

[26] D. J. Kriegman, E. Triendl, and T. O. Binford, “Stereo vision and nav-igation in buildings for mobile robots,”IEEE Trans. Robot. Automat.,vol. 5, pp. 792–803, Dec. 1989.

[27] J. J. Leonard, H. F. Durrant-Whyte, and I. J. Cox, “Dynamic mapbuilding for an autonomous mobile robot,”Int. J. Robot. Res., vol. 11,no. 4, pp. 286–297, 1992.

[28] S. Koenig and R. Simmons, “Unsupervised learning of probabilisticmodels for robot navigation,” inProc. IEEE Conf. Robotics andAutomation, vol. 3, 1996, pp. 2301–2308.

[29] D. Pagac, E. M. Nebot, and H. Durrant-Whyte, “An evidential ap-proach to map-building for autonomous vehicles,”IEEE Trans. Robot.Automat., vol. 14, pp. 623–629, Aug. 1998.

[30] A. Arleo, J. R. del Millán, and D. Floreano, “Efficient learning ofvariable-resolution cognitive maps for autonomous indoor navigation,”IEEE Trans. Robot. Automat., vol. 15, pp. 990–1000, Dec. 1999.

Mark Whitehorn (S’97) received the B.S. and M.S.degrees in physics in 1977 and 1981, respectively,from the University of New Orleans, New Orleans,LA, and the M.S. degree in engineering systems in2001 from the Colorado School of Mines, Golden,where he is currently working toward the Ph.D.degree.

He is developing a stereo vision system for3-D modeling of scenes in underground miningapplications. From 1981 to 1996, he was withLockheed Missiles and Space Systems and Martin

Marietta Aerospace. His work in aerospace included development of imageand signal exploitation systems, and the design and implementation of a systemfor real-time computation of cross-ambiguity functions, used in geolocatingRF emitters.

Tyrone Vincent (S’87–M’98) received the B.S. de-gree from the University of Arizona, Tucson, in 1992,and the M.S. and Ph.D. degrees from the Universityof Michigan, Ann Arbor, in 1994 and 1997, respec-tively, all in electrical engineering.

He is currently an Assistant Professor at the Col-orado School of Mines, Golden. His research inter-ests include nonlinear estimation, system identifica-tion and fault detection with applications in materialsprocessing, robotics, and power systems.

Christian H. Debrunner (S’86–M’91) received theB.S. degree in computer engineering and the M.S.and Ph.D. degrees in electrical engineering from theUniversity of Illinois, Urbana-Champaign, in 1981,1984, and 1990, respectively.

From 1990 to 1998, he was with Lockheed Martin,Denver, CO, working on research in the areas ofautomatic target recognition and automated satelliteimagery interpretation. Since 1998, he has been anAssistant Professor at the Colorado School of Mines(CSM), Golden. His research at CSM has focused

on the use of motion information in imagery for 3-D motion and structureestimation, human motion analysis, and vision-based control. More recently,he has been studying imaging of welding processes and droplet depositionprocesses for analysis and control, as well as tomographic reconstruction fromfluoroscopic image sequences.

Dr. Debrunner has served on the Editorial Board ofPattern Recognitionsince1993.

WHITEHORN et al.: STEREO VISION IN LHD AUTOMATION 29

John Steele(M’89) received the B.S. degree (cumlaude) in physics, the M.S. degree in mechanical en-gineering, and the Ph.D. degree from the Universityof New Mexico (UNM), Albuquerque, in 1970, 1986,and 1988, respectively.

He was a ferrite heads manufacturing engineerfor Ampex for a couple of years, before moving toAlaska for eight years. During this time, he becamea pipefitter and welder, and boomed on the Alaskapipeline. While at UNM, he worked in the TribologyLaboratory. He also worked at Sandia National

Laboratories in the Robotics Division, building mobile robots. He joined astartup company designing wafer-handling robots for clean-room applications.He then joined the faculty at the Colorado School of Mines (CSM), Golden.He helped start the Laboratory for Intelligent Automated Systems (LIAS)within the Engineering Division and has conducted research on conditionmonitoring and machine health for intelligent machines. He is one of thePrincipal Investigators within the Western Mining Resource Center (WMRC),is part of the Center for Welding, Joining, and Coatings Research (CWJCR),and has projects within the Earth Mechanics Institute (EMI). His researchinterests are in robotics, perception, and intelligent machines. Current projectsinclude robotic systems for mining, pipe welding automation, and robotics foreducation.