8
Generating Edge Voxel Maps with Structure Tensor Analysis Julian Ryde, Jeffrey A. Delmerico and Jason J. Corso Computer Science and Engineering SUNY at Buffalo, Buffalo NY, USA Email: {jryde,jad12,jcorso}@buffalo.edu Abstract— Mobile robots need a compact yet sufficiently descriptive representation of their operating environs. For 2D, occupancy grids have proved successful for both SLAM and path planning. Extending these to volumetric 3D occupancy grids requires careful consideration due to the large memory requirements of these dense arrays. One mechanism for reduc- ing the number of voxels in the map is to store only occupied voxels. In this work we extend this further by filtering out the occupied voxels that are on planes leaving only voxels that contain geometric edges. Volumetric edge extraction is performed by analyzing the neighborhood of each occupied voxel using the structure tensor, which summarizes the local gradient magnitude and its dominant directions. Classification of the structure tensor eigenvalues permits removal of voxels that are part of planar regions (e.g. floors, walls, and ceilings). The remaining voxels trace a wire-frame-like model of the environment, dramatically reducing the number of voxels required to represent the same space. Fewer voxels in this representation require less memory, enable faster alignment for map building, and isolation of edge voxels may enable new robotic applications that exploit these edges in 3D. To judge the efficacy of edge voxel extraction for mapping, we consider the problem of point-based scan-to-scan mapping using three representations: full voxelized versions of the raw scans, edge voxels extracted from these, and octrees. We also assess volumetric scan-to-map alignment performance using all occupied voxels and just edge voxels. We demonstrate speedups in alignment time for both scan-to-scan and scan-to- map matching when using edge voxels, and show no significant change in pose error compared to using all voxels. I. I NTRODUCTION For indoor mapping, planes seem a natural consideration for landmarks, and indeed numerous researchers have ex- plored plane based mapping [1]. Although less common in robotic experiments, there are places not suitable for planar mapping including buildings with curved walls, natural out- door environments and extremely cluttered scenes such as those found in search and rescue scenarios. Although planes can be a good way of compressing map information, an observed planar surface is not as constraining to robot pose as feature points and edges. We propose the use of geometric edges as a compressed representation of 3D volumetric data and develop several al- gorithms for extracting edge voxels and performing mapping with them. We evaluate the performance of edge voxels in these tasks relative to a full volumetric representation of occupied voxels as well as a compressed octree represen- tation. We demonstrate that edge voxels provide superior performance in mapping (see Fig. 1) while simultaneously Fig. 1. A voxel map and the corresponding edge voxel map for the mason hallway dataset. introducing semantic information into the representation. Another strong motivation for use the edge voxel maps is that they allow localisation by both range and image sensors (see Fig. 3). For many environments, considering only edge voxels removes floors, walls and ceilings leaving the voxels that lie along the intersection of planes or in cluttered regions, resulting in significant compression of the data. A rigorous definition of edge voxels is as elusive as one of edge pixels in images. Edges manifest along paths of high contrast in images, and are due to four main reasons. 1) Texture change — Abrupt change in surface color. 2) Lighting change — Sharp shadows. 3) Range discontinuity — Abrupt change in distance from the observer. 4) Surface normal change — E.g. intersection of two planes. It is important to appreciate the distinction in the causes of image edges. Texture change and illumination edges are not observed by 3D sensors. So the remaining geometric edge types are range discontinuities and abrupt surface normal changes. Surface normal changes are pose invariant, however edges due to range discontinuities can vary with observer position. These surface normal and range discontinuities are illustrated in the last image of Fig. 2. The cylinder sides in Fig. 2 are examples of range discontinuities. The position of these edges varies in 3D space as the position of the observer shifts whereas the cylinder rim edge position is consistent regardless of observer position. For use in mapping, we desire the following characteris- tics from extracted edge voxels: they should be generally

Generating Edge Voxel Maps with Structure Tensor Analysis

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Generating Edge Voxel Maps with Structure Tensor Analysis

Generating Edge Voxel Maps with Structure Tensor Analysis

Julian Ryde, Jeffrey A. Delmerico and Jason J. CorsoComputer Science and Engineering

SUNY at Buffalo, Buffalo NY, USAEmail: {jryde,jad12,jcorso}@buffalo.edu

Abstract— Mobile robots need a compact yet sufficientlydescriptive representation of their operating environs. For 2D,occupancy grids have proved successful for both SLAM andpath planning. Extending these to volumetric 3D occupancygrids requires careful consideration due to the large memoryrequirements of these dense arrays. One mechanism for reduc-ing the number of voxels in the map is to store only occupiedvoxels. In this work we extend this further by filtering outthe occupied voxels that are on planes leaving only voxels thatcontain geometric edges.

Volumetric edge extraction is performed by analyzing theneighborhood of each occupied voxel using the structuretensor, which summarizes the local gradient magnitude andits dominant directions. Classification of the structure tensoreigenvalues permits removal of voxels that are part of planarregions (e.g. floors, walls, and ceilings). The remaining voxelstrace a wire-frame-like model of the environment, dramaticallyreducing the number of voxels required to represent the samespace. Fewer voxels in this representation require less memory,enable faster alignment for map building, and isolation of edgevoxels may enable new robotic applications that exploit theseedges in 3D.

To judge the efficacy of edge voxel extraction for mapping,we consider the problem of point-based scan-to-scan mappingusing three representations: full voxelized versions of the rawscans, edge voxels extracted from these, and octrees. We alsoassess volumetric scan-to-map alignment performance usingall occupied voxels and just edge voxels. We demonstratespeedups in alignment time for both scan-to-scan and scan-to-map matching when using edge voxels, and show no significantchange in pose error compared to using all voxels.

I. INTRODUCTION

For indoor mapping, planes seem a natural considerationfor landmarks, and indeed numerous researchers have ex-plored plane based mapping [1]. Although less common inrobotic experiments, there are places not suitable for planarmapping including buildings with curved walls, natural out-door environments and extremely cluttered scenes such asthose found in search and rescue scenarios. Although planescan be a good way of compressing map information, anobserved planar surface is not as constraining to robot poseas feature points and edges.

We propose the use of geometric edges as a compressedrepresentation of 3D volumetric data and develop several al-gorithms for extracting edge voxels and performing mappingwith them. We evaluate the performance of edge voxels inthese tasks relative to a full volumetric representation ofoccupied voxels as well as a compressed octree represen-tation. We demonstrate that edge voxels provide superiorperformance in mapping (see Fig. 1) while simultaneously

Fig. 1. A voxel map and the corresponding edge voxel map for the masonhallway dataset.

introducing semantic information into the representation.Another strong motivation for use the edge voxel mapsis that they allow localisation by both range and imagesensors (see Fig. 3). For many environments, consideringonly edge voxels removes floors, walls and ceilings leavingthe voxels that lie along the intersection of planes or incluttered regions, resulting in significant compression of thedata.

A rigorous definition of edge voxels is as elusive as oneof edge pixels in images. Edges manifest along paths of highcontrast in images, and are due to four main reasons.

1) Texture change — Abrupt change in surface color.2) Lighting change — Sharp shadows.3) Range discontinuity — Abrupt change in distance from

the observer.4) Surface normal change — E.g. intersection of two

planes.It is important to appreciate the distinction in the causes of

image edges. Texture change and illumination edges are notobserved by 3D sensors. So the remaining geometric edgetypes are range discontinuities and abrupt surface normalchanges. Surface normal changes are pose invariant, howeveredges due to range discontinuities can vary with observerposition. These surface normal and range discontinuities areillustrated in the last image of Fig. 2. The cylinder sides inFig. 2 are examples of range discontinuities. The position ofthese edges varies in 3D space as the position of the observershifts whereas the cylinder rim edge position is consistentregardless of observer position.

For use in mapping, we desire the following characteris-tics from extracted edge voxels: they should be generally

Page 2: Generating Edge Voxel Maps with Structure Tensor Analysis

Fig. 2. Image with associated edges due to appearance and due to geometry.

invariant to rotation and translation, and they should behelpful in terms of constraining pose. We hence directly seekthe fourth type of edges, which are due to surface normalchanges. However, since we are using range data as input,we are vulnerable to range discontinuity edges. In practice,though, with reasonable overlap of adjacent scans, this is nota problem. The extraction should be fast and eliminate a highpercentage of occupied voxels in typical mobile robot datawhilst ensuring there are always some voxels present in thevast majority of situations.

Observing these desirable edge voxel properties and al-gorithmic factors, we propose an edge voxel extractionalgorithm based on the 3D structure tensor that operates on3D volumetric occupancy grids in a semi-sparse, blockwisemanner (see Sec. III-C). Our eigenvalue classification methoddirectly seeks the types of geometric edges we have arguedare most suitable for mapping. Finally, to allow for scalabilityto larger, realistic scenarios, we have implemented our voxelextraction routine in a semi-sparse blockwise volumetricoccupancy grid.

II. RELATED WORK

At this juncture a distinction should be made betweenmaps containing the positions of landmarks or features anddenser maps. Feature maps aid localisation but do not allowobstacle detection. Dense maps consisting of point cloudsor occupancy grids enable localisation, path planning andobstacle avoidance.

The following list summarises these maps in descendingsparsity and places the edge voxel maps into context of thebroader research. Sparser maps are smaller, easier to storeand quicker to process however they do not work with aswide a range of robot tasks.• 3D occupancy grid — Extension of 2D occupancy grids• Occupied voxel list — Occupied voxels only• Edge voxel map — Non-planar voxels• Feature map — List of point features and their covari-

ancesIn [2] they extend conventional landmark based SLAM toincorporate edge information by the extraction of edgeletsfrom the scene image. However, the resulting map is still asubset of the edge pixels visible in the scene.

The building of maps whilst considering all occupiedvoxels has proved successful for both indoor and outdoorenvironments [3]. There is a continuum in sparsity rangingfrom full 3D occupancy to feature maps. Feature extraction,

Fig. 3. Potential for image-edge-based visual localization to an edge map:3D mesh overlaid with edge voxel markers (in red) and camera images ofthe same scene with edge extraction performed (edges in red). These edgevoxels were produced by processing the mesh’s occupancy grid using theproposed method. Note that the edge voxels include those that are occludedfrom the camera’s perspective, voxels along the boundary of the mesh, andartifacts of meshification that are not normally present in voxelized scandata.

whilst extremely helpful, comes at a price, namely reducedgeneralisation where mapping will fail in environments with-out the requisite features. It is observed that for indoorenvironments, while reliable point features can sometimesbe absent, there are usually edge features. Edge mappingis faster and the associated maps are smaller and thereforerequire less memory. In the worst case, if there are notenough edges available it is possible to resort to full matchingof the occupied voxels.

Although the structure tensor has been widely use in image[4] and video analysis [5], [6], these earlier uses incorporatedifferent analysis of the tensor due to the nature of the data.We are not aware of any work using the structure tensor invoxel occupancy maps.

Some preliminary results on our method were presentedin [7]. Approximately concurrent to that work, a keypointdetector based on a 3D extension of the Harris corneroperator was introduced to the experimental branch of thePoint Cloud Library [8]. This detector operates on local

Page 3: Generating Edge Voxel Maps with Structure Tensor Analysis

normals of points, without a direct volumetric analog, so wedo not compare performance to that method here. A relatedapproach for selecting interest points on 3D meshes wasintroduced in [9]. These are the only other known approachesapplying structure tensor analysis to 3D geometric data.

III. EDGE VOXEL EXTRACTION

The determination of geometric edges in 3D data can becarried out either on a point cloud or voxel representation.If it is done on the voxel representation then much of thework from the computer vision community can be extendedand applied; a 3D occupancy grid is a very similar structureto a 2D image. Finding edges in 3D is the 3D equivalentto finding corners in 2D images. In image processing, acorner is a 0D entity in the 2D image and for volumetricedge extraction we are looking for edges(1D) throughouta volume(3D) a reduction in each case of 2 dimensions.Considering feature extraction as dimension reduction meansthese are equivalent. By this analysis determining edges(1D)in an image(2D) is equivalent to extracting planes(2D) in3D, both a dimension reduction of 1D.

Operating on the point cloud allows algorithms basedon Principal Component Analysis (PCA). Groups of points,either those in a voxel or a point and its k nearest neighbourswithin a region, can be analyzed and the resulting eigenvaluesused to determine the planar nature of the points. These pla-nar regions of the scan and map can be detected and removedprior to map matching. For many indoor environments thiscan remove a large number of the occupied voxels leaving asubset that include geometric edge voxels.

The main advantage of point based metrics is that theycan operate at a higher resolution, however the number ofpoints recorded in the map grows unbounded with time. Thismap growth occurs for a continuously operating mobile roboteven if it does not explore new areas.

Secondly, voxels can encode the difference between un-known and free space. This information improves edgeclassification in the face of incomplete data. For instance,apparent edge voxels bordering on unknown regions shouldnot be classified as edge voxels because it is possible thatthe unknown voxels next to edge voxels are occupied, but asyet unobserved.

Sources of inspiration for volumetric edge detection comefrom the image processing literature including Haar features[10] and bilateral filtering, mentioned in [11] and used in[12]. In [13], they use gradient information to extract pointfeatures from range images, but specifically exclude straightline edge points. However, in three dimensions, edges pro-vide useful features for matching and scan compression.We experimented with template matching, Haar features,difference of Gaussians (DoG), and PCA for edge voxelextraction (or planar voxel exclusion) but found limitationsto these approaches. Template matching and Haar featureresponses required axis alignment of edges to produce goodresults, and DoG and PCA failed to isolate the edge voxelswell from planar voxels. Instead, we analyze the eigenvalues

of the 3D structure tensor for each occupied voxel in orderto extract edges.

The discrete structure tensor operating on 3D volumetricoccupancy grids is similar in premise to PCA on the originalpoints. The structure tensor captures the distribution andcoherence of the gradient structure in a local neighborhood ofa point in space. The relative magnitudes of the eigenvaluesof the structure tensor can be used to classify whether avoxel is an edge, corner, or planar voxel. A large eigenvalueindicates that there is a large local gradient in the directionof its corresponding eigenvalue. This is the same principlebehind the Harris corner detection method [4] for extractingedges and corners in images. In 2D, analysis of the 2 × 2structure tensor can classify whether an image pixel is anedge (one large and one small eigenvalue), a corner (twolarge eigenvalues), or a point that is not of interest (twosmall eigenvalues).

Consider the three eigenvalues of the structure tensor inthree dimensions. For a plane, the structure tensor has onelarge eigenvalue and two small ones. A line or edge willhave two large and one small. For an isolated region inspace all eigenvalues are large and for a homogeneous regionall the eigenvalues are small. These latter two cases arerare because isolated unsupported regions are unusual andrange sensors operating in normal environments (withoutvolumetric substances such as fog or cloudy water) naturallyonly deliver voxels at the interfaces or surfaces in thesurroundings.

There do not appear to be any previous attempts at usingstructure tensor analysis for edge detection on voxelizeddata; however, the 3D structure tensor has previously beenapplied to video data [5], [6]. Although they use eigenvalueanalysis, these approaches are looking for boundaries ofobjects, which in the dense volumetric data of videos presentthemselves the way planes do in our data. Consequently, theirmethods for extracting the desired regions are different dueto the nature of their data.

A. Structure Tensor

The 3D structure tensor is derived from the weightedsum of squared differences between shifted volume patches.Consider a subvolume V of an occupancy grid I and thesame volume shifted by (x, y, z). The weighted sum ofsquared differences for that shifted patch, S(x, y, z), is:

S(x, y, z) =∑v∈V

w(v)(I(v + (x, y, z))− I(v))2 (1)

where w is some weighting function defined over each voxelv in the subvolume. Using the Taylor series expansion ofI(v + (x, y, z)):

I(v + (x, y, z)) ≈ I(v) + Ix(v)x+ Iy(v)y + Iz(v)z (2)

this can be simplified to:

S(x, y, z) =∑v∈V

w(v)(Ix(v)x+ Iy(v)y + Iz(v)z)2 (3)

Page 4: Generating Edge Voxel Maps with Structure Tensor Analysis

which can be written in matrix form as:

S(x, y, z) ≈(x y z

)A

xyz

(4)

where A is the structure tensor for the original volume:

A =∑v∈V

w(v)

I2x IxIy IxIzIxIy I2y IyIzIxIz IyIz I2z

(5)

Computed over a subvolume centered at some voxel p =(xp, yp, zp) in the occupancy grid, the eigenvalues of thestructure tensor will summarize the gradient structure in thatlocal neighborhood around p, and can be used to determineif p is an edge or not.

In order to compute the structure tensor eigenvalues foreach voxel in an occupancy grid, we proceed in the followingsteps. The occupancy grid (which is binary) is first pre-smoothed with a multivariate Gaussian filter, which addi-tionally permits easy computation of the partial derivativeswith Gaussian partials instead of finite difference estimates.By smoothing with a kernel having half-width h, the partialx derivative at a voxel p = (xp, yp, zp) can be computedwith:

Ix(p) = −x exp[−(

x2p

(h2 )2+

y2p

(h2 )2+

z2p

(h2 )2

)](6)

for x ∈ [−h, h], with the y and z partial derivatives computedsimilarly. Note, we ignore the scale factor of the Gaussianderivatives as it results in a constant factor on the structuretensor that scales all of the eigenvalues at all voxels equally.Therefore, we omit this extra computation and incorporatethe constant factor into the threshold for edge voxels.

Then for each occupied voxel p, we consider a k× k× kneighborhood centered at p (a neighborhood of half-width hsuch that k = 2h+1). For each voxel in that neighborhood,we use a 3D Gaussian weighting function:

w = exp

[−(x2ph

+y2ph

+z2ph

)](7)

and as with the Gaussian derivatives, we omit the scalefactor. This is combined with the previously computed partialderivatives to determine the entries of the structure tensorat p. The partial derivatives and the weighting kernel areall separable, so in practice these correlations are performedwith three one-dimensional kernels. Eigendecomposition ofthe structure tensor A at p yields eigenvalues s, m, andl (for smallest, middle, and largest) in ascending order ofmagnitude.

B. Eigenvalue Classification

Our approach to selecting edge voxels based on theireigenvalues is motivated by the physical meaning of thestructure tensor and reinforced by an analysis of real worldscan data. Much like the eigenvalues and eigenvectors of thecovariance matrix in Principal Component Analysis explainthe orthogonal directions of greatest variance, the eigenvalues

0.0 0.2 0.4 0.6 0.8 1.0Smallest/Largest

0.0

0.2

0.4

0.6

0.8

1.0

Mid

dle/

Larg

est

Fig. 4. 2D Histogram of middle and largest eigenvalues for voxels in atypical 3D scan. Here, red represents the highest frequency and dark blue afrequency of zero. Ideal edges (two large, equal eigenvalues) are along they = x diagonal, and planes (one large, two small eigenvalues) are alongthe left side. Other edge-like and prominent voxels are represented in thebins that are far from the y-axis. The black line represents our thresholdingapproach of selecting edge-like voxels with two eigenvalues greater than 1.0(to the right of the line) and removing planar voxels with only one largeeigenvalue (to the left of the line).

and eigenvectors of the structure tensor reveal the directionsof greatest gradient. Geometrically, we are expecting thatedge-like structures in the point cloud data will have twolarge eigenvalues (indicating two orthogonal directions withlarge gradients) and one small eigenvalue (for the directionalong the edge). We enforce this property by thresholdingthe eigenvalues by their magnitudes.

From the set of all occupied voxels, we select candidateedge voxels that have s < tmag and m > tmag , l > tmag

for some magnitude threshold tmag . We selected a thresholdvalue of tmag = 1.0 based on analysis of the distributionof eigenvalues obtained from a typical 3D point cloud witha 5 × 5 × 5 structure tensor kernel. Our motivation forthresholding comes from an analysis of real world data asshown in Fig. 4. We selected this threshold value empiricallybased on the histogram; almost all voxels pass this test forthe smallest and largest eigenvalues, but it provides us withsome discriminating power at the middle eigenvalue. Thisthreshold excludes the peaks in the distribution along they-axis (planar voxels) but keeps the more prominent voxelswith two large eigenvalues. We have experimented with thevalue of tmag and found it to be robust and flexible, and wehave explored other approaches for this classification and thata simple magnitude threshold produces the “best looking”edges (see Fig. 5).

The previous structure tensor approaches to video segmen-tation [5], [6] also compute some metrics on the eigenvalues,but focus more on coherency measures. Wang and Ma [6]

do compute an edge measure(

l−ml+m

)2with the larger two

eigenvalues that also seeks to capture the relationship of thelargest two eigenvalues, but they provide no motivation for

Page 5: Generating Edge Voxel Maps with Structure Tensor Analysis

Fig. 5. Example regions of a scan from bremen city showing the voxelizedscan and the edges extracted using our method.

that particular metric, and instead we chose our system basedon the above analysis of our data, which is very differentfrom the dense data of video.

Algorithm 1 Edge Voxel ExtractionLoad point cloud from sensor into occupied voxel list L of resolution r.Let V be the set of all occupied voxels in LLet N be the set of all voxels within a k × k × k neighborhood of avoxel in VLet S = V ∪Nfor all s ∈ S do

Compute Ix(s), Iy(s), and Iz(s)Compute (Ix(s))2, (Iy(s))2, (Iz(s))2, Ix(s)Iy(s), Ix(s)Iz(s), andIy(s)Iz(s)

end forLet G be the k × k × k Gaussian kernelLet E be an empty occupied voxel list to hold edge voxelsfor all v ∈ V do

Compute Axx = G ∗ (Ix(s))2 (similarly for Axy , Axz , Ayy , Ayz ,and Azz)Compute the structure tensor for v:

A(v) =

[Axx Axy Axz

Axy Ayy Ayz

Axz Ayz Azz

]Determine the eigenvalues for A(v) and sort them in ascending order:s ≤ m ≤ lif s < tmag , m > tmag , l > tmag , and m

l> tratio then

Classify v as an edge voxel and add v to Eend if

end for

C. Semi-Sparse blockwise implementation

In practice, edge voxel extraction is performed by dis-cretizing 3D sensor data at a chosen resolution and storingit in a semi-sparse blockwise data structure.

This blockwise structure consists of a coarse resolution(1m) dense grid that contains pointers for non-empty coarsevoxels. The pointers at each occupied coarse voxel point tothe corresponding fine resolution (0.05m) dense occupancygrid for that coarse voxel. Fig. 6 is a 2D illustration ofthis. This is similar in ethos to an octree but with onlyone level of indirection is much faster to access a particularvoxel at the cost of increase in required memory. Thememory consumption is vastly reduced compared to a denseoccupancy grid at the fine resolution. If occupied voxels

Fig. 6. 2D illustration of the semi-sparse blockwise data structureused which enables fast access whilst reducing memory requirements forprocessing large datasets. It is a hybrid structure consist of a dense coarse(1m) grid of pointers to fine grids (0.05m) for occupied coarse grid voxels.

TABLE IPROPERTIES OF DATASETS USED IN MAPPING EXPERIMENTS

Name Scans Total Points Volume (m3)mason lab 9 20,000,000 12× 15× 2.5 = 450

mason hallway 10 21,000,000 17.5× 22.5× 3 = 1181bremen city 13 215,000,000 800× 842× 201 = 135, 393, 600

were distributed randomly throughout space the semi-sparseapproach would offer less memory savings, however dueto the inherently clustered nature of typical real 3D datathe blockwise data structure is suitable. Secondly often notjust a single voxel is required but a group of voxels in aneighborhood are needed and so retrieving a dense array ofthese is very fast.

The structure tensor and its eigenvalues are then computedfor each occupied voxel, and edge voxels are identified bythresholding the eigenvalues, as described above. The resultis an occupancy list of edge voxels that can then be usedfor creating an edge voxel map and localizing to it. Thisprocedure is summarized in Algorithm 1.

Unlike a dense representation such as an occupancy grid,the semi-sparse blockwise structure only contains data for theoccupied coarse voxels, so the kernel operations (smoothing,derivatives, structure tensor weighting) that require neighbor-ing values operate on each occupied coarse block in turn,and not on the full occupancy grid at once. For all of theseoperations, a coarse block is processed as a dense array forits edges according to Alg. 1 and the extracted edges addedto a list of edges for the full blockwise structure.

IV. EXPERIMENTS

We performed mapping experiments using three 3Ddatasets, whose specifications are detailed in Table I. Thedataset provided in [14] has two components: mason laband mason hallway, which were captured with a Riegl LMS-Z390 Laser Scanner in indoor environments. The bremen city[15] dataset was recorded with a Riegl VZ-400 Laser Scannerin the city center of Bremen, Germany. All three sets includea ground truth sensor pose for each scan.

Page 6: Generating Edge Voxel Maps with Structure Tensor Analysis

Alignment Time Total Time Pose Errors Voxel Countsm

ason

lab

0 1 2 3 4 5 6 7 8Scan number

0

1

2

3

4

5

6

7

8

9E

xecu

tion

Tim

e(s

)

Octrees (3DTK)Edge voxels (3DTK)All voxels (3DTK)Edge voxels (MROL)All voxels (MROL)

0 10 20 30 40 50 60 70 80Execution Time (s)

Octrees(3DTK)

Edgevoxels

(3DTK)

Allvoxels

(3DTK)

Edgevoxels

(MROL)

Allvoxels

(MROL)

Extraction timeAlignment timeMap update time

0 1 2 3 4 5 6 7 8Scan number

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Posi

tion

erro

r(m

)

0

2

4

6

8

10

12

Rot

atio

ner

ror

(deg

)

Octrees pos. (3DTK)Edge voxels pos. (3DTK)All voxels pos. (3DTK)Edge voxels pos. (MROL)All voxels pos. (MROL)Octrees rot. (3DTK)Edge voxels rot. (3DTK)All voxels rot. (3DTK)Edge voxels rot. (MROL)All voxels rot. (MROL)

0 1 2 3 4 5 6 7 8Scan number

0

1

2

3

4

5

6

7

8

9

Vox

elco

unts

×105

Octrees (3DTK)Edge voxels (3DTK)All voxels (3DTK)Edge voxels (MROL)All voxels (MROL)Ground truth

mas

onha

llway

0 1 2 3 4 5 6 7 8 9Scan number

0

5

10

15

20

25

Exe

cuti

onT

ime

(s)

Octrees (3DTK)Edge voxels (3DTK)All voxels (3DTK)Edge voxels (MROL)All voxels (MROL)

0 20 40 60 80 100 120 140Execution Time (s)

Octrees(3DTK)

Edgevoxels

(3DTK)

Allvoxels

(3DTK)

Edgevoxels

(MROL)

Allvoxels

(MROL)

Extraction timeAlignment timeMap update time

0 1 2 3 4 5 6 7 8 9Scan number

0

2

4

6

8

10

12

Posi

tion

erro

r(m

)

0

5

10

15

20

Rot

atio

ner

ror

(deg

)

Octrees pos. (3DTK)Edge voxels pos. (3DTK)All voxels pos. (3DTK)Edge voxels pos. (MROL)All voxels pos. (MROL)Octrees rot. (3DTK)Edge voxels rot. (3DTK)All voxels rot. (3DTK)Edge voxels rot. (MROL)All voxels rot. (MROL)

0 1 2 3 4 5 6 7 8 9Scan number

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Vox

elco

unts

×106

Octrees (3DTK)Edge voxels (3DTK)All voxels (3DTK)Edge voxels (MROL)All voxels (MROL)Ground truth

brem

enci

ty

0 2 4 6 8 10 12Scan number

0

20

40

60

80

100

Exe

cuti

onT

ime

(s)

Octrees (3DTK)Edge voxels (3DTK)All voxels (3DTK)Edge voxels (MROL)

0 500 1000 1500 2000 2500 3000 3500 4000Execution Time (s)

Octrees(3DTK)

Edgevoxels

(3DTK)

Allvoxels

(3DTK)

Edgevoxels

(MROL)

Extraction timeAlignment timeMap update time

0 2 4 6 8 10 12Scan number

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Posi

tion

erro

r(m

)

0

2

4

6

8

10

12

14

Rot

atio

ner

ror

(deg

)

Octrees pos. (3DTK)Edge voxels pos. (3DTK)All voxels pos. (3DTK)Edge voxels pos. (MROL)Octrees rot. (3DTK)Edge voxels rot. (3DTK)All voxels rot. (3DTK)Edge voxels rot. (MROL)

0 2 4 6 8 10 12Scan number

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

Vox

elco

unts

×107

Octrees (3DTK)Edge voxels (3DTK)All voxels (3DTK)Edge voxels (MROL)Ground truth

Fig. 7. Execution timings, pose errors, and map voxel counts for the three experimental datasets (top to bottom): mason lab, mason hallway, and bremencity. Alignment time shows the scan by scan execution time to align each scan with the previous one (3DTK) or the map (MROL) from a starting posethat is perturbed from the ground truth. Total time represents the breakdown in end-to-end processing time for each method counting edge extraction/octreecompression time, alignment, and map update. The pose error plots show position and rotation errors for each aligned scan relative to the ground truthpose in a global coordinate frame. Finally, the estimated poses were used to align the raw scans, and the resulting map was voxelized; the voxel countplots show the total number of occupied voxels for each method’s final map.

For each of these datasets, we analyze the performanceof edge voxel extraction (Sec. IV-A), and then use themto perform scan-to-scan matching by iterative closest point(ICP) and scan-to-map matching using multi-resolution oc-cupied voxel list mapping (Sec. IV-B). We perform the samemapping procedure using the full set of occupied voxelsprior to edge extraction as a performance baseline. We alsocompare the performance of edge voxels to octrees with ICP,in order to evaluate scan compression methods for mapping.We use the ICP implementation in the 3D Toolkit’s slam6dutility as our scan-to-scan matching system [16]. For scan-to-map matching, we use the voxel-based alignment algorithmfrom the Multi-Resolution Occupied Voxel Lists (MROL)library [17]. Voxel and edge voxel center points are usedas the point locations in ICP alignment, as are the octreeleaf node centers. Mapping efficacy is judged on speedand alignment accuracy. Execution times in all experimentsrepresent the processing time for a task on a single core ofa 2.4GHz CPU.

A. Edge Voxel Extraction

We extract the edge voxels for an input scan by loadingthe full point cloud into a blockwise structure and processingeach occupied coarse block for edges at the fine resolution.For the fine voxels that satisfy the eigenvalue thresholdingprocedure (see Sec. III-B), we output their geometric centers

TABLE IICOMPRESSION FACTORS FOR EDGE VOXELS AND OCTREES OVER ALL

VOXELS

Dataset Edge voxels Octreesmason lab 2.53 1.98

mason hallway 2.87 1.75bremen city 4.47 1.59

as points in a point cloud. These extracted edge voxel centerscan then be used for mapping.

We evaluate the edge voxel extraction procedure by con-sidering the computation time as well as the scalability ofthe blockwise processing approach relative to the volume of3D space being processed (see Fig. 7). We also assess thethe compression rate of the input data relative to the octreecompressed format (see Table II).

Based on the total time graphs in Fig. 7, overall process-ing time for edge voxels in the current implementation isdominated by their extraction, despite improvements in otherphases of the task. Except for the mason hallway datasetwith MROL, these other speedups do not result in reducedoverall processing time. However, the semi-sparse blockwisestructure used for processing offers tremendous built-in par-allelism, and an accelerated version of this approach couldremedy the currently high cost of edge voxel extraction.

Page 7: Generating Edge Voxel Maps with Structure Tensor Analysis

Fig. 8. Edge voxel maps for, top to bottom, mason lab and mason hallway,and a region of the bremen city edge voxel map that has been selected toshow detail due to the size of the full map. The mason dataset edge voxelmaps were aligned experimentally using MROL from perturbed poses. Dueto the poor performance of all tested methods on the bremen city datasetin overcoming introduced error, this edge voxel map is shown here alignedfrom ground truth poses for clarity.

Table II summarizes the average compression rate perscan for each compressed type over the full set of voxels.While the actual rate varies by dataset (and likely by therelative density and planarity of the data within it), edgevoxels demonstrate significantly higher compression ratesthan octrees on all of the datasets, with the disparity in theircompression rates increasing with the size of the dataset.

B. Mapping Experiments

We evaluate these different representations (voxels, oc-trees, edge voxels) with two mapping methods: scan-to-scan matching with ICP [16] and scan-to-map matchingwith MROL [17], and summarize the quantitative results inFigure 7. We use a 5cm resolution for all three formats:

voxel size, the blockwise fine resolution for edge voxels, andoctree leaf node size. We use the voxel centers and octreeleaf node centers as points in performing ICP. Since MROLmapping requires voxelized inputs, we use MROL only onvoxels and edge voxels. For the experiments, a pose erroris introduced to the ground truth pose. The pose error isnormally distributed with a translational σ of 0.5m and arotation σ of 10 degrees, and for each scan the same erroris used across all experimental methods.

We are evaluating the relative robustness of these represen-tations, and not the effectiveness of the alignment algorithms,so we use both slam6d ICP and MROL mapping with theirdefault settings. Consequently, not all of the mapping trialsproduced well aligned scans, including all methods on thebremen city dataset, but we are able to more effectively judgethe mapping properties of the representations than if the taskwas trivial.

For each dataset, the Alignment Time plot shows theexecution time to align each scan of the set using vari-ous algorithm and representation pairings. The Total Timeplot shows the end-to-end time for the full trial, includingextraction/compression (if using edge voxels or octrees),alignment, and map upodate times. Pose error is assessedwith position error and rotational errors, as well as voxelcount, which is computed by aligning the original scans usingthe estimated poses from each method, voxelizing the map,and counting the voxels. The intuition in using voxel count asa measure of accuracy is that if two scans are poorly aligned,the resulting map will contain many more occupied voxels,so we seek a voxel count that is close to that of the groundtruth. Translational and angular pose errors are computedrelative to the ground truth pose in the global coordinateframe, so the the errors indicated are cumulative.

Note that the bremen city set is lacking a trial for all voxelsusing MROL. Due to memory constraints (8GB), it was notpossible to produce a map using all of the voxels for a datasetof this size.

V. ANALYSIS

The results of our experiments in edge voxel extractionand mapping provide some insights into the utility of theedge voxel representation.

A volumetric mapping approach (MROL) works very wellwith edge voxels, providing consistently low error for alldatasets. This error rate is at least as low as the error from allvoxels with MROL, and typically many times lower than allthe options w/ 3DTK. However, this robustness comes at theexpense of alignment speed, since the volumetric approachis consistently slower than ICP.

The results of the ICP trials indicate that edge voxels (andvoxels) are not as effective a representation for ICP as octreeson small datasets. However, edge voxels work much betteron large data, although still not well enough to overcome theintroduced error. On the small datasets, edge voxel alignmentis at least consistent with using all voxels, and on masonhallway is quite a bit better.

Page 8: Generating Edge Voxel Maps with Structure Tensor Analysis

All of the trials faired somewhat poorly on the bremencity dataset. Although the MROL edge voxel trial maintainedthe lowest positional and rotational errors, the resultingmap is still not satisfactory. Due to the large volume ofspace covered in each scan, it is possible that both mappingalgorithms are unable to handle the separation betweenpoints that should coincide when a small angular perturbationis propagated out several hundred meters from the sensororigin.

Edge voxel compression decreases memory consumptionby a significantly larger factor than octree compression (up to∼ 3×). The volumetric mapping approach is more limited bymemory than ICP, but using edge voxel compression enablesyou to build a map at the scale of the bremen city dataset,when it is not possible using all voxels.

Although edge voxel extraction is currently the bottleneckfor performance using that representation, parallelizationcould enable fast end-to-end processing time.

Final edge voxel maps are shown in Fig. 8. These mapsshow promise for performing image-edge based visual local-ization to an edge voxel map.

VI. CONCLUSION

We apply an edge voxel classification based on the 3Dstructure tensor to specifically exclude planar regions andperform scan-to-map matching of the resulting wire-framelike models. This approach is novel, whilst others havedone plane or straight line fitting on point clouds and thenmatched those, we operate on the voxels rather than pointsand remove those voxels in planar regions. The edge voxelsarising from the rejection of planar voxels means they donot need to lie on straight lines and our approach works incluttered environments. We choose to reject the planar voxelsbecause they occur more frequently when there are flat walls,floors and ceilings, and thus their removal results in a largereduction in the voxels that have to be stored. Their higheroccurrence makes them less distinctive and consequently lessuseful for scan-to-map matching and pose determination.

We aim to improve the quality of edge voxel extraction byaccounting for the difference in unknown and free voxels.At the moment, only occupied voxels are maintained andso occupied voxels bordering on unknown voxels are erro-neously classified as edge voxels. Over time this results inthe degradation of the map quality as erroneous edge voxelsaccumulate in the map.

Whilst feature based SLAM has proven effective in manysituations it relies on successful feature extraction and identi-fication which cannot be guaranteed. Scan-to-map matchingapproaches do not require feature extraction or associationand therefore are immune to this problem. Their downsideis that they require much more memory and computation.The extraction of the edge voxels by filtering out the oftennumerous planar voxels reduces the map size and acceleratesscan alignment. In the experiments presented, involvinglaser range data, we record a 2.5 − 4.5× decrease in mapstorage and a similar increase in alignment speed (Fig. 7).Even if edge voxel extraction fails, this approach can still

seamlessly fall back to full occupied voxel matching makingits operation robust in many environments.

ACKNOWLEDGEMENTS

This material is based upon work partially supportedby the Federal Highway Administration under Coopera-tive Agreement No. DTFH61-07-H-00023, the Army Re-search Office (W911NF-11-1-0090) and the National ScienceFoundation CAREER grant (IIS-0845282). Any opinions,findings, conclusions or recommendations are those of theauthors and do not necessarily reflect the views of the FHWA,ARO, or NSF.

REFERENCES

[1] K. Pathak, A. Birk, N. Vaskevicius, M. Pfingsthorn, S. Schwertfeger,and J. Poppinga, “Online 3D SLAM by registration of large planarsurface segments and closed form pose-graph relaxation,” Journal ofField Robotics: Special Issue on 3D Mapping, vol. 27, no. 1, pp. 52–84, 2010.

[2] E. Eade and T. Drummond, “Edge landmarks in monocular SLAM,”in In Proc. British Machine Vision Conf, 2006.

[3] J. Ryde and N. Hillier, “Alignment and 3D scene change detectionfor segmentation in autonomous earth moving,” in Proceedings of theIEEE International Conference on Robotics and Automation (ICRA),May 2011.

[4] C. Harris and M. Stephens, “A combined corner and edge detector,”in Alvey vision conference, vol. 15. Manchester, UK, 1988, p. 50.

[5] G. Kuhne, J. Weickert, O. Schuster, and S. Richter, “A tensor-drivenactive contour model for moving object segmentation,” in Proceedingsof the International Conference on Image Processing, vol. 2. IEEE,2001, pp. 73–76.

[6] H. Wang and K. Ma, “Automatic video object segmentation via 3Dstructure tensor,” in Proceedings of the International Conference onImage Processing, vol. 1. IEEE, 2003, pp. I–153.

[7] J. Ryde and J. A. Delmerico, “Extracting edge voxels from 3d volu-metric maps to reduce map size and accelerate mapping alignment,” inComputer and Robot Vision (CRV), 2012 Ninth Conference on. IEEE,2012, pp. 330–337.

[8] R. B. Rusu and S. Cousins, “3d is here: Point cloud library (pcl),” inProceedings of the IEEE International Conference on Robotics andAutomation (ICRA), Shanghai, China, May 2011.

[9] I. Sipiran and B. Bustos, “Harris 3d: a robust extension of theharris operator for interest point detection on 3d meshes,” The VisualComputer, vol. 27, no. 11, pp. 963–976, 2011.

[10] X. Cui, Y. Liu, S. Shan, X. Chen, and W. Gao, “3D haar-likefeatures for pedestrian detection,” in Multimedia and Expo, 2007 IEEEInternational Conference on, July 2007, pp. 1263–1266.

[11] B. Steder, R. B. Rusu, K. Konolige, and W. Burgard, “Point featureextraction on 3D range scans taking into account object boundaries,”in Proceedings of the IEEE International Conference on Robotics andAutomation (ICRA), 2011.

[12] A. Ansar, A. Castano, and L. Matthies, “Enhanced real-time stereousing bilateral filtering,” in 2nd Int. Symp. on 3D Data Processing,2004.

[13] B. Steder, G. Grisetti, and W. Burgard, “Robust place recognitionfor 3d range data based on point features,” in IEEE InternationalConference on Robotics and Automation (ICRA). IEEE, 2010, pp.1400–1405.

[14] J. Mason, S. Ricco, and R. Parr, “Textured occupancy grids formonocular localization without features,” in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRA), Shang-hai, China, May 9–13 2011.

[15] D. Borrmann and J. Elseberg. (2013, February) Bremen city dataset.[Online]. Available: http://kos.informatik.uni-osnabrueck.de/3Dscans/

[16] A. Nüchter, K. Lingemann, J. Hertzberg, and H. Surmann, “6d slam -3d mapping outdoor environments,” Journal of Field Robotics, vol. 24,no. 8-9, pp. 699–722, 2007.

[17] J. Ryde and H. Hu, “3D mapping with multi-resolution occupied voxellists,” Autonomous Robots, vol. 28, no. 2, pp. 169–185, Feb. 2010.