74

Image Restoration for 3D Computer Vision

Embed Size (px)

Citation preview

Page 2: Image Restoration for 3D Computer Vision

Deep Learning for Structure-from-Motion (SfM)https://www.slideshare.net/PetteriTeikariPhD/deconstructing-sfmnet

Dataset creation for Deep Learning-based Geometric Computer Vision problemshttps://www.slideshare.net/PetteriTeikariPhD/dataset-creation-for-deep-learningbased-geometric-computer-vision-problems

Emerging 3D Scanning Technologies for PropTechhttps://www.slideshare.net/PetteriTeikariPhD/emerging-3d-scannng-technologies-for-proptech

Geometric Deep Learninghttps://www.slideshare.net/PetteriTeikariPhD/geometric-deep-learning

Page 3: Image Restoration for 3D Computer Vision

DefinitionsImage restoration concepts and 3D data structures

Page 4: Image Restoration for 3D Computer Vision

Data structures for real estate scansRGB+D Pixel grid presenting color and depthor 2.5D image

Example from Prof. Li

Mesh (Polygon) from voxel data (“3D pixels”)

Voxel grid meshing using marching cubes (StackExchange)

Point Cloud unordered data typically (i.e. not on a grid but sparse points on non-integer coordinates)

Page 5: Image Restoration for 3D Computer Vision

Denoising remove noise from signal

A spatially cohesive superpixel model for image noise level estimationPeng Fua, Changyang Li, Weidong Cai, Quansen SunNeurocomputing Volume 266, 29 November 2017, Pages 420-432https://doi.org/10.1016/j.neucom.2017.05.057

Superpixels generated by SCSM and SLIC from the noisy “Fish” image at various noise levels. (a)–(d) Original and noisy images; noise SD values of 10, 20, and 30 from (b) to (d), respectively. (e)–(h) Superpixels generated by SCSM from the corresponding images. (i)–(l) Superpixels generated by SLIC from the corresponding images.

This paper proposes an automatic noise level estimation method. In contrast with the conventional rectangular-block division algorithms, the images are decomposed into superpixels that exhibit better adherence to the local image structures, thus generating a division into small regions that are more likely to be homogeneous. Moreover, the effective use of the spatial neighborhood information makes the SCSM more insensitive to image noise

https://sites.google.com/site/pierrickcoupe/softwares/denoising-for-medical-imaging/mri-denoisinghttp://dx.doi.org/10.1002/jmri.22003

https://youtu.be/5Y7yeRo5vGE

Help selecting noise reduction plugin for Photoshop CC 2014

https://www.dpreview.com/forums/post/54065189

Imagenomic Noiseware, Neat Image, Nik Software DFine 2, Topaz DeNoise 5, NoiseNinja

Especially with classical image processing algorithms, it is beneficial to reduce noise before applying the actual processing / analysis.

Page 6: Image Restoration for 3D Computer Vision

DeConvolution / Deblurring

Recent Progress in Image DeblurringRuxin Wang, Dacheng Tao (Submitted on 24 Sep 2014)https://arxiv.org/abs/1409.6838

http://blogs.adobe.com/photoshop/2011/10/behind-all-the-buzz-deblur-sneak-peek.html

Deconvolve the image with the Point Spread Function (PSF) that convolved the scene during image formation to sharpen the image

Page 7: Image Restoration for 3D Computer Vision

edge-Aware Image smoothingSmooth constant patches while retaining sharp edges instead of “dumb Fourier low-pass filter” that destroys the edges

Deep Edge-Aware Filtershttp://lxu.me/projects/deepeaf/ | http://proceedings.mlr.press/v37/xub15.html

L0 smoothing

BLF Bilateral Filter

Our method is based on a deep convolutional neural network with a gradient domain training procedure, which gives rise to a powerful tool to approximate various filters without knowing the original models and implementation details.

Efficient High-Dimensional, Edge-Aware Filtering | http://doi.org/10.1109/MCG.2016.119

Hui Huang, Shihao Wu, Minglun Gong, Daniel Cohen-Or, Uri Ascher, and Hao Zhang, "Edge-Aware Point Set Resampling," ACM Trans. on Graphics (presented at SIGGRAPH 2013), Volume 32, Number 1, Article 9, 2013. [PDF | Project page with source code

https://doi.org/10.1145/2421636.2421645The denoising capability of the blurring-sharpening strategy based on the tooth volume (mesh). (a-d) are obtained by adding one particular type of noise, as indicated by the corresponding captions. SNR (in dB) of the noisy and the smoothed volumes are shown in each figure.

Page 8: Image Restoration for 3D Computer Vision

Super-resolutionDepending on your background, super-resolution mean slightly different things

https://www.ucl.ac.uk/super-resolution:

Super-resolution imaging allows the imaging of fluorescently labelled probes at a resolution of just tens of nanometers, surpassing classic light microscopy by at least one order of magnitude. Recent advances such as the development of photo-switchable fluorophores, high-sensitivity microscopes and single-molecule localisation algorithms make super-resolution imaging rapidly accessible to the wider life sciences research community.

At UCL we are currently taking a multidisciplinary effort to provide researchers access to super-resolution imaging systems. The Super-Resolution Facility (SuRF) currently features commercial systems supporting the PALM/STORM, SIM and STED super-resolution approaches.

Beyond diffraction-limited Multiframe‘Statistical upsampling’e.g. deep learning

http://www.infrared.avio.co.jp/en/products/ir-thermo/lineup/r500/index.html

http://www.robots.ox.ac.uk/~vgg/research/SR/

https://techcrunch.com/2016/06/20/twitter-is-buying-magic-pony-technology-which-uses-neural-networks-to-improve-images/

Deep Learning for Isotropic Super-Resolution from Non-Isotropic 3D Electron Microscopy Larissa Heinrich, John A. Bogovic, Stephan Saalfeld HHMI Janelia Research Campus, Ashburn, USAhttps://arxiv.org/abs/1706.03142

Page 9: Image Restoration for 3D Computer Vision

Geometrical super-resolution

Both features extend over 3 pixels but in different amounts, enabling them to be localized with precision superior to pixel dimension

Multi-exposure image noise reduction

When an image is degraded by noise, there can be more detail in the average of many exposures, even within the diffraction limit. See example on the right.

Single-frame deblurring

Known defects in a given imaging situation, such as defocus or aberrations, can sometimes be mitigated in whole or in part by suitable spatial-frequency filtering of even a single image. Such procedures all stay within the diffraction-mandated passband, and do not extend it.

Sub-pixel image localization

The location of a single source can be determined by computing the "center of gravity" (centroid) of the light distribution extending over several adjacent pixels (see figure on the left). Provided that there is enough light, this can be achieved with arbitrary precision, very much better than pixel width of the detecting apparatus and the resolution limit for the decision of whether the source is single or double. This technique, which requires the presupposition that all the light comes from a single source, is at the basis of what has becomes known as superresolution microscopy, e.g. STORM, where fluorescent probes attached to molecules give nanoscale distance information. It is also the mechanism underlying visual hyperacuity.

Bayesian induction beyond traditional diffraction limit

Some object features, though beyond the diffraction limit, may be known to be associated with other object features that are within the limits and hence contained in the image. Then conclusions can be drawn, using statistical methods, from the available image data about the presence of the full object. The classical example is Toraldo di Francia's proposition of judging whether an image is that of a single or double star by determining whether its width exceeds the spread from a single star. This can be achieved at separations well below the classical resolution bounds, and requires the prior limitation to the choice "single or double?"

The approach can take the form of extrapolating the image in the frequency domain, by assuming that the object is an analytic function, and that we can exactly know the function values in some interval. This method is severely limited by the ever-present noise in digital imaging systems, but it can work for radar, astronomy, microscopy or magnetic resonance imaging. More recently, a fast single image super-resolution algorithm based on a closed-form solution l2 problems has been proposed (Zheo et al. 2016) and demonstrated to

accelerate most of the existing Bayesian super-resolution methods significantly.

WIKIPEDIA: Detail-revealing Deep Video Super-resolutionXin Tao, Hongyun Gao, Renjie Liao, Jue Wang, Jiaya Jia (Submitted on 10 Apr 2017)https://arxiv.org/abs/1704.02738

Recent deep-learning-based video SR methods [Caballero et al. 2016; Kappeler et al. 2016] compensate inter-frame motion by aligning all other frames to the reference one, using backward warping. We show that such a seemingly reasonable technical choice is actually not optimal for video SR, and improving motion compensation can directly lead to higher quality SR results. In this paper, we achieve this by proposing a sub-pixel motion compensation (SPMC) strategy, which is validated by both theoretical analysis and extensive experiments.

Page 10: Image Restoration for 3D Computer Vision

Optical or diffractive super-resolutionWIKIPEDIA:

Substituting spatial-frequency bands. Though the bandwidth allowable by diffraction is fixed, it can be positioned anywhere in the spatial-frequency spectrum. Dark-field illumination in microscopy is an example. See also aperture synthesis.

Multiplexing spatial-frequency bands such as structured illumination, An image is formed using the normal passband of the optical device. Then some known light structure, for example a set of light fringes that is also within the passband, is superimposed on the target. The image now contains components resulting from the combination of the target and the superimposed light structure, e.g. moiré fringes, and carries information about target detail which simple, unstructured illumination does not. The “superresolved” components, however, need disentangling to be revealed.

Multiple parameter use within traditional diffraction limit If a target has no special polarization or wavelength properties, two polarization states or non-overlapping wavelength regions can be used to encode target details, one in a spatial-frequency band inside the cut-off limit the other beyond it. Both would utilize normal passband transmission but are then separately decoded to reconstitute target structure with extended resolution.

Probing near-field electromagnetic disturbance The usual discussion of superresolution involved conventional imagery of an object by an optical system. But modern technology allows probing the electromagnetic disturbance within molecular distances of the source which has superior resolution properties, see also evanescent waves and the development of the new Super lens.

Optical negative-index metamaterialsNature Photonics 1, 41 - 48 (2007) doi: 10.1038/nphoton.2006.49 | Cited by 2372

Sub–Diffraction-Limited Optical Imaging with a Silver SuperlensScience 22 Apr 2005: Vol. 308, Issue 5721, pp. 534-537doi: 10.1126/science.1108759 | Cited by 3219 articles

Optical and acoustic metamaterials: superlens, negative refractive index and invisibility cloakJournal of Optics, Volume 19, Number 8 http://dx.doi.org/10.1088/2040-8986/aa7a1f

→Special issue on the history of metamaterials

http://zeiss-campus.magnet.fsu.edu/articles/superresolution/supersim.html

Page 11: Image Restoration for 3D Computer Vision

InpaintingPaint over artifacts / missing values using surrounding pixels (“Clone Tool in Photoshop”), or more statistically using the same image (“Content-Aware Fill”), or bigger databases for example in deep learning pipelines

The TUM-Image Inpainting DatabaseTechnische Universität München https://www.mmk.ei.tum.de/tumiid/

Context Encoders: Feature Learning by Inpainting(2016) Deepak Pathak, Phillip Krähenbühl, Jeff Donahue, Trevor Darrell, Alexei A. Efros

http://people.eecs.berkeley.edu/~pathak/context_encoder/ Improve your skin with Inpaint

https://www.theinpaint.com/Guillemot and Le Meur (2014)http://dx.doi.org/10.1109/MSP.2013.2273004

Yang et al. (2017) https://arxiv.org/abs/1611.09969

Page 12: Image Restoration for 3D Computer Vision

2DImage Restoration

Page 13: Image Restoration for 3D Computer Vision

Multiframe 2D super-resolution #1A Unified Bayesian Approach to Multi-Frame Super-Resolution and Single-Image Upsampling in Multi-Sensor ImagingThomas Köhler, Johannes Jordan, Andreas Maier and Joachim HorneggerProceedings of the British Machine Vision Conference (BMVC), pages 143.1-143.12. BMVA Press, September 2015.https://dx.doi.org/10.5244/C.29.143

Robust Multiframe Super-Resolution Employing Iteratively Re-Weighted MinimizationThomas Köhler ; Xiaolin Huang ; Frank Schebesch ; André Aichert ; Andreas Maier ; Joachim HorneggerIEEE Transactions on Computational Imaging ( Volume: 2, Issue: 1, March 2016 )https://doi.org/10.1109/TCI.2016.2516909

Future work should consider an adaption of our prior to blind super-resolution where the camera PSF is unknown or other image restoration problems, e. g. image deconvolution.

In this work, we limited ourselves to non-blind super-resolution, where the PSF is assumed to be known. However, iteratively reweighted minimization could be augmented by blur estimation. Another promising extension is joint motion estimation and super-resolution, e. g. by using the nonlinear least squares algorithm. Conversely, blur and motion estimation can also benefit when using it in combination with our spatially adaptive model. One further direction of our future work is to make our approach adaptive to the scene content, e. g. by a local selection of the sparsity parameter p.

Page 14: Image Restoration for 3D Computer Vision

ClassicalImage Acquisition techniques

Page 16: Image Restoration for 3D Computer Vision

Multiframe techniques or multisweep techniques #1High Fidelity Scan MergingComputer Graphics Forum July 2010 http://doi.org/10.1111/j.1467-8659.2010.01773.x

For each scanned object 3D triangulation laser scanners deliver multiple sweeps corresponding to multiple laser motions and orientations.

Scan integration as a labelling problemPattern Recognition Volume 47, Issue 8, August 2014, Pages 2768-2782https://doi.org/10.1016/j.patcog.2014.02.008

Example of overlapping scans. This head is such a complex structure that not less than 35 scans were acquired to fill in most holes.

Example of two overlapping scans, points of each scanned are first meshed ((c)-(d)) separately. The result can be compared to the meshing of points of both scans together (d)

Comparison of registration of two scans (colored in different colors on the top figure) using Global Non Rigid Alignment (middle) and scale space merging (bottom).

Comparisons of the merging (a) with a level set (Poisson Reconstruction) reconstruction method of the unmerged scans point set (b) and a filtering of the unmerged scans point set (c). The level set method obviously introduces a serious smoothing, yet does not eliminate the scanning boundary lines. The bilateral filter, applied until all aliasing artifacts have been eliminated, over-smoothes some parts of the shape.

Page 17: Image Restoration for 3D Computer Vision

Multiframe techniques or multisweep techniques #2Density adaptive trilateral scan integration methodBao-Quan Shi and Jin LiangApplied Optics Vol. 54, Issue 19, pp. 5998-6009 (2015)https://doi.org/10.1364/AO.54.005998

Multi-Focus Image Fusion Via Coupled Sparse Representation and Dictionary LearningRui Gao, Sergiy A. Vorobyov (Submitted on 30 May 2017)Aalto University, Dept. Signal Processing and Acousticshttps://arxiv.org/abs/1705.10574

Standard pipelining of 3D modeling of commercial scanner XJTUOM

Integration of 26 partially overlapping scans of a dice model. (a) SDF method. (b) Screened Poisson method. (c) Advancing front triangulation method. (d) K-means clustering method. (e) The new method.

The new method is more robust to large gaps/registration errors than previous methods. Owing to the noise-removal property of the trilateral shifting procedure and mean-shift clustering algorithm, the new method produces much smoother surfaces.

Page 18: Image Restoration for 3D Computer Vision

Multiframe techniques or multisweep techniques #3Crossmodal point cloud registration in the Hough space for mobile laser scanning dataBence Gálai ; Balázs Nagy ; Csaba BenedekPattern Recognition (ICPR), 2016https://doi.org/10.1109/ICPR.2016.7900155

Top row: Point clouds of three different vehicle mounted Lidar systems (Velodyne HDL64 and VLP16 I3D scanners, and a Riegl VMX450 MMS), captured from the same scene at Fővám Tér, Budapest. Bottom row: segmentation results for each cloud by our proposed method

Page 19: Image Restoration for 3D Computer Vision

Multiframe techniques or multisweep techniques #4Frame Rate Fusion and Upsampling of EO/LIDAR Data for Multiple PlatformsT. Nathan Mundhenk ; Kyungnam Kim ; Yuri OwechkoComputer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEEhttps://doi.org/10.1109/CVPRW.2014.117

The left pane shows the PanDAR demonstrator sensors with the red Ladybug sensor mounted over the silver Velodyne 64E LIDAR. A custom aluminum scaffold connects the two sensors. The right pane shows the graphical interface with displays of the 3D model in the top, help menus and the depth map at the bottom.

Multithreaded programing and GP-GPU methods allow us to obtain 10 fps with a Velodyne 64E LIDAR completely fused in 360° using a Ladybug panoramic camera.

PanDAR: a wide-area, frame-rate, and full color lidar with foveated region using backfilling interpolation upsamplingT. Nathan Mundhenk; Kyungnam Kim; Yuri OwechkoProceedings Volume 9406, Intelligent Robots and Computer Vision XXXII: Algorithms and Techniques; 94060K (2015)Event: SPIE/IS&T Electronic Imaging, 2015, San Francisco, California, United Stateshttp://dx.doi.org/10.1117/12.2078348

Page 20: Image Restoration for 3D Computer Vision

Multiframe techniques or multisweep techniques #5Upsampling method for sparse light detection and ranging using coregistered panoramic imagesRuisheng Wang; Frank P. FerrieJ. of Applied Remote Sensing, 9(1), 095075 (2015)http://dx.doi.org/10.1117/1.JRS.9.095075

See-through problem and invalid light detection and ranging (LiDAR, Velodyne HDL-64E) points returned from building interior. (a) Camera image rendered from a certain viewpoint, (b) corresponding LiDAR image rendered from the same viewpoint of (a), (c) corresponding LiDAR image rendered from a top-down viewpoint

“There are a number of improvements that are possible and are topics for future work. The initial depth ordering that used to determine visibility assumes a piecewise planar partition of the scene. While this can suffice for the urban environment considered here, a more general approach would consider a richer form of representation, e.g., using statistical modeling methods. Cues that are available in the coregistered intensity data, such as the loci of occluding contours, could also be exploited. At present, our interpolation strategy samples image space to determine connectivity and backprojects to 3-D, resulting in a nonuniform interpolation. A better solution would be to perform the sampling in 3-D by backprojecting the 2-D boundary and forming a 3-D bounding box that could then be interpolated at the desired resolution. In the limit, true multimodal analysis would consider the joint distribution of both intensity and depth information with the aim of inferring more detailed interpolation functions. With the availability of sophisticated platforms such as Navteq True, there is clearly an incentive to move in these directions.”

Page 21: Image Restoration for 3D Computer Vision

ClassicalImage restoration for LASER Scans

Laser Scanner Super-resolutionhttps://doi.org/10.2312/SPBG/SPBG06/009-015Cited by 47 articles

Page 22: Image Restoration for 3D Computer Vision

Point cloud processing ReviewsPoint Cloud ProcessingRaphaële Héno andLaure Chandelierin 3D Modeling of Buildings. Chapter 5. (2014)http://doi.org/10.1002/9781118648889.ch5

A review of algorithms for filtering the 3D point cloudSignal Processing: Image Communication Volume 57, September 2017, Pages 103-112https://doi.org/10.1016/j.image.2017.05.009

Octree structuring: point cloud and different levels of the hierarchical grid

Example of significant noise on the profile view of a target: the standard deviation for a point cloud at target level is 8 mm

A brief discussion of future research directions are presented as follows.

1) Combination of color and geometric information: For point clouds, especially these containing color information, a pure color or pure geometric attributes based method cannot work well. Hence, it is expected to combine the color and geometric information in the filtering process to further increase the performance of a filtering scheme.

2) Time complexity reduction: Because point clouds contain a large number of points, some of which can be up to hundreds of thousands or even millions of points, computation on these point clouds is time consuming. It is necessary to develop filtering technologies to filter point cloud effectively to reduce time complexity.

3) Filtering on point cloud sequence: Since object recognition from a point cloud sequence will become the future research direction. filtering the point cloud sequence will help to improve the performance and accuracy of object recognition.

Page 24: Image Restoration for 3D Computer Vision

Point cloud denoising #1Similarity based filtering of point cloudsJulie DigneComputer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEEhttps://doi.org/10.1109/CVPRW.2012.6238917

Photogrammetric DSM denoisingNex, F; Gerke, M. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences; Gottingen XL.3: 231-238. Gottingen: Copernicus GmbH. (2014)http://dx.doi.org/10.5194/isprsarchives-XL-3-231-2014

Differences between ground truth and noisy DSM

Photogrammetric Digital Surface Models (DSM) are usually affected by both random noise and gross errors. These errors are generally concentrated in correspondence of occluded or shadowed areas and are strongly influenced by the texture of the object that is considered, or the number of images employed for the matching.

In the future, further tests will be performed on other real DSM in order to assess the reliability of the developed method in very different operative conditions. Then, the extension from the 2.5D case to the fully 3D will be performed and further comparisons with other available denoising algorithms will be performed, as well.

In addition, a key feature of our method is that it is independent of a surface mesh: it can work directly on point clouds, which is useful, since building a mesh of a noisy point cloud is never easy, whereas building a mesh of a properly denoised shape is well understood. A possible extension for this work would be to use the filter as a projector onto the surface, in a spirit similar to [Lipman et al. 2007] for example.

Page 25: Image Restoration for 3D Computer Vision

Point cloud denoising #2Point Cloud Denoising via Moving RPCAE. Mattei, A. CastrodadComputer Graphics Forum (2106). doi: 10.1111/cgf.13068

Guided point cloud denoising via sharp feature skeletonsThe Visual Computer June 2017, Volume 33, Issue 6–8, pp 857–867Yinglong Zheng, Guiqing Li, Shihao Wu, Yuxin Liu, Yuefang Gaohttps://doi.org/10.1007/s00371-017-1391-8

Denoising synthetic datasets of two planes meeting at increasingly shallow angles (20.4K points) with added Gaussian noise of standard deviation equal to 1% of the length of the bounding box diagonal. The two planes meet at an angle of 140º, 150º and 160º. The first and second rows show the noisy 3D data and 2D transects, respectively. Rows 3–5 show the results of the bilateral filter, AWLOP and MRPCA.

Denoising of the Vienna cathedral SfM model. The noisy input was processed with MRPCA followed by a simple outlier removal method using Meshlab.

Although MRPCA is robust against outliers, this robustness is achieved only locally. One simple modification to achieve global outlier robustness is to use an l1 data fitting term in problem (P6). Although the l1-norm will be able to handle the global outliers better than the Frobenius norm used in this work, the computational cost will increase significantly. We point out that the size of the neighbourhoods is set globally. One improvement over the current method could be to make the neighbourhood size a function of the local point density. This could have a positive effect when handling datasets with spatially varying noise

Page 26: Image Restoration for 3D Computer Vision

Point cloud Edge Detection #1Fast and Robust Edge Extraction in Unorganized Point CloudsDena Bazazian ; Josep R. Casas ; Javier Ruiz-HidalgoDigital Image Computing: Techniques and Applications (DICTA), 2015https://doi.org/10.1109/DICTA.2015.7371262

Page 27: Image Restoration for 3D Computer Vision

Point cloud Edge Detection #2Segmentation-based Multi-Scale Edge Extraction to Measure the Persistence of Features in Unorganized Point CloudsDena Bazazian ; Josep R. Casas ; Javier Ruiz-Hidalgo"12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications". Porto: 2017, p. 317-325.http://dx.doi.org/10.5220/0006092503170325

Estimating the neighbors of a sample point on the ear of bunny at large scales: (a) far away neighbors may belong to foreign surfaces when Euclidean distance is used; (b) geodesic distance is a better choice to explore large local neighborhoods; (c) the point cloud can be segmented to distinguish different surfaces.

Page 28: Image Restoration for 3D Computer Vision

Point cloud super-resolution #1LidarBoost: Depth superresolution for ToF 3D shape scanningSebastian Schuon ; Christian Theobalt ; James Davis ; Sebastian ThrunComputer Vision and Pattern Recognition, 2009. CVPR 2009https://doi.org/10.1109/CVPR.2009.5206804

A new upsampling method for mobile LiDAR dataRuisheng Wang ; Jeff Bach ; Jane Macfarlane ; Frank P. FerrieApplications of Computer Vision (WACV), 2012 IEEE https://doi.org/10.1109/WACV.2012.6162998

Real scene - wedges and panels (a): This scene with many depth edges (b) demonstrates the true resolution gain. Image-based super-resolution (IBSR) (c) demonstrates increased resolution at the edges, but some aliasing remains and the strong pattern in the interior persists. LidarBoost (d) reconstructs the edges much more clearly and there is hardly a trace of aliasing, also the depth layers visible in the red encircled area are better captured. Markov-Random-Field (MRF) upsampling (e) oversmooths the depth edges and in some places allows the low resolution aliasing to persist.

The main contributions of this paper are

● A 3D depth sensor superresolution method that incorporates ToF specific knowledge and data. Additionally a new 3D shape prior is proposed, that enforces 3D specific properties.

● A comprehensive evaluation of the working range and accuracy of our algorithm using synthetic and real data captured with a ToF camera.

● Only few depth superresolution approaches have been developed previously. We show that our algorithm clearly outperforms the most related approaches.

Page 29: Image Restoration for 3D Computer Vision

Point cloud super-resolution #2Geometry Super-Resolution by ExampleThales Vieira ; Alex Bordignon ; Thomas Lewiner ; Luiz VelhoComputer Graphics and Image Processing (SIBGRAPI), 2009https://doi.org/10.1109/SIBGRAPI.2009.10

Laser stripe model for sub-pixel peak detection in real-time 3D scanning Ingmar Besic ; Zikrija AvdagicSystems, Man, and Cybernetics (SMC), 2016 IEEEhttps://doi.org/10.1109/SMC.2016.7844912

Our tests show that noise does not vary significantly when observed on different color channels. Thus estimator algorithms can utilize any of color channels without sacrificing precision due to significantly increased noise. However if clever choice it to be made then estimator should opt for the green channel as it provides the most reliable stripe intensity data for both black and white surfaces across the whole modulation ROI. We have found that noise does not have continuous uniform distribution PDF, but normal distribution PDF and proposed a model that fits empirical data. Our measurements support assumption that the laser stripe image has approximately Gaussian intensity profile. However RMSE values show that single Gaussian curve fit is not the best choice as stripe intensity profile is superposed with surface reflections. After testing Gaussian fits from 1 to 8 we have concluded that models n > 2 are not suitable as they produce false peaks or subtract light intensity to achieve better RMSE. Thus we proposed Gaussian fit with two curves as optimal model based on the empirical data.

Our future work is to target relations between coefficients of proposed laser stripe intensity profile and reduce their number if possible. Preliminary test show that mean values b1 and b2 of Gaussian curves tend to be equal or differ for subpixel amount. It is not yet clear if this difference has any significance or can be neglected. We also intend to test our model with different angles between laser source and target surface.

The proposed method is limited to models which have repeated occurrences of a shape, and restrict the resolution increase to the regions of those occurrences. Increasing the resolution of other parts would require inpainting-like tools to extrapolate the geometry [Sharf et al. 2004], together with a superresolution scheme as an extension of the one proposed here.

Page 30: Image Restoration for 3D Computer Vision

Point cloud super-resolution #3Super-Resolution of Point Set Surfaces Using Local SimilaritiesAzzouz Hamdi-Cherif,, Julie Digne, Raphaëlle ChaineComputer Graphics Forum 2017http://dx.doi.org/10.1111/cgf.13216

Super-resolution of a single scan of the Maya point set. Left: initial scan, right: super-resolution. For visualization purposes, both are reconstructed with Poisson reconstruction

Super-resolution of an input shape with highly repetitive geometric texture. (a) Underlying shape to be sampled by an acquisition device. (b) Low-resolved input sampling of the shape and local approximation with a quadric at each point; geometric texture is the residue over the quadric. (c) Super-resolved re-sampling using our method (fusion of super-resolved local patches). Right column: Generation of the super-resolved patches. (d) Construction of a local descriptor of the residue over a low-resolution grid corresponding to the unfolded quadric; blue points represent the height values estimated at bin centres, red points are the input points. (e) Similar descriptor points are added (orange points) to the input points (in red) of the local descriptor. (f) A super-resolved descriptor is computed from the set of red and orange points

Super-resolution of a single scan of the Persepolis point set. Left: initial scan, right: super-resolution. The shape details appear much sharper after the super-resolution process. Parameters: r = 4 (Shape diagonal: 114), nbins

lr =

64, nbinssr

= 400 and rsim

= 0.2.

Page 31: Image Restoration for 3D Computer Vision

Point cloud Classification #1

Page 32: Image Restoration for 3D Computer Vision

Point cloud Classification #2Multi-class US traffic signs 3D recognition and localization via image-based point cloud model using color candidate extraction and texture-based recognitionVahid Balali, Arash Jahangiri and Sahar Ghanipoor MachianiAdvanced Engineering Informatics Volume 32, April 2017, Pages 263-274https://doi.org/10.1016/j.aei.2017.03.006

An improved Structure-from Motion (SfM) procedure is developed to create a clean 3D point cloud from the street level imagery and assist with accurate 3D localization by color and texture features extraction. The detected traffic signs are triangulated using camera pose information and their corresponding locations are visualized in 3D environment. The proposed method as shown in Fig. 1, mainly consists of three key components:

1) Detecting and classifying traffic signs using 2D images;

2) Reconstructing and automatically cleaning a 3D point cloud model; and

3) Recognizing and localizing traffic signs in 3D environment.

Page 33: Image Restoration for 3D Computer Vision

Point cloud Clustering for simplification #1Adaptive simplification of point cloud using k-means clusteringBao-Quan Shi, Jin Liang, Qing LiuComputer-Aided Design Volume 43, Issue 8, August 2011, Pages 910-922https://doi.org/10.1016/j.cad.2011.04.001

A parallel point cloud clustering algorithm for subset segmentation and outlier detectionChristian Teutsch , Erik Trostmann,, Dirk BerndtProceedings Volume 8085, Videometrics, Range Imaging, and Applications XI; 808509 (2011)http://dx.doi.org/10.1117/12.888654

Cluster initialization of the Stanford bunny. Left: input data. Middle: initialization of the cluster centroids. Right: initial clusters are formed, and one cluster is shown in one color.

If the noise of the 3D point cloud is serious, effective noise filtering should be conducted before the simplification. The proposed method can also simplify multiple 3D point clouds. Our future research will concentrate on simplifying multiple 3D point sets simultaneously

For example, a point set with two million coordinates is analyzed within three seconds and 15 million points within 35 seconds on Intel Core2 processor. It handles arbitrary ndimensional data formats, e.g. with additional color and/or normal vector information since it is implemented as a template class. The algorithm is easy to parallelize which further increases the computation performance on multi-core machines for most applications. The feasibility of our clustering technique has been evaluated at the example of a variety of point clouds from different measuring applications and 3D scanning devices.

Page 34: Image Restoration for 3D Computer Vision

Point cloud Object detection #1Object Detection in Point Clouds Using Conformal Geometric AlgebraAksel Sveier, Adam Leon Kleppe, Lars Tingelstad and Olav EgelandAdvances in Applied Clifford Algebras 2017http://dx.doi.org/10.1007/s00006-017-0759-1

In this paper we focus on the detection of primitive geometric models in point clouds using RANSAC. A central step in the RANSAC algorithm is to classify inliers and outliers. We show that conformal geometric algebra (CGA) enable filters with geometrical interpretation for inlier/outlier classification. The last step of the RANSAC algorithm is fitting the primitive to its inliers. This can be performed analytically with CGA, and the method is identical for both planes and spheres.

Setup of the robotic pick-and-place demonstration. Point clouds from the 3D camera is used for detecting the plane, spheres and cylinder. The information is sent to the robot arm, which is used to place the spheres in the cylinder

Spheres were successfully detected in point clouds with up to 90% outliers and cylinders could successfully be detected in point clouds with up to 80% outliers. We suggested two methods for constructing a cylinder from point data using CGA and found that fitting two spheres to a cylinder gave performance advantages compared to constructing a circle and line from 3 points on the cylinder surface.

Page 35: Image Restoration for 3D Computer Vision

Point cloud Compression static #1Research on the Self-Similarity of Point Cloud Outline for Accurate CompressionXuandong An; Xiaoging Yu ; Yifan Zhang 2015 International Conference on Smart and Sustainable City and Big Data (ICSSC)http://dx.doi.org/10.1049/cp.2015.0272

The Lovers of Bordeaux (15.8 million points). Exploiting self-similarity in the model, we compress this representation down to 1.15 MB. The resulting model (right) is very close to the original one (left), as the reconstruction error is less than the laser scanner precision (0.02mm) for 99.14% of the input points.

Point cloud compression approaches have mostly dealt with coordinates quantization via recursive space partitioning [Gandoin and Devillers 2002; Schnabel and Klein 2006; Huang et al. 2006; Smith et al. 2012]. In a nutshell, these approaches consist in inserting the points in a space partitioning data structure (e.g. octree, kd-tree) of given depth, and to replace them by the center of the cell they belong to.

Self similarity of measured signals has gained interest over the past decade: research on signal processing as well as image processing has accomplished outstanding progress by taking advantage of the self-similarity of the measure. In the image processing field, the idea originated in the non-local means algorithm [Buades et al. 2005]: instead of denoising a pixel using its neighboring pixels, it is denoised by exploiting pixels of the whole image looking similar. The similarity between pixels is computed by comparing patches around them. Behind this powerful tool lies the idea that pixel far away from the considered area might entail information that will help processing it, because of the natural self-similarity of the image.

Self-similarity of surfaces has mainly been exploited for surface denoising applications: the non local means filter has been adapted for surfaces be it meshes [Yoshizawa et al. 2006] or point clouds [Adams et al. 2009; Digne 2012]. It was also used to define a Point Set Surface variant [Guillemot et al. 2012] exhibiting better robustness to noise. Self-similarity of surfaces is obviously not limited to denoising purposes. For example, analyzing the similarity of a surface can lead to detect symmetries or repetition structures in surfaces [Mitra et al. 2006; Pauly et al. 2008]. An excellent survey of methods exploiting symmetry in shapes can be found in [Mitra et al. 2013].

There are several ways in which our compression scheme could be improved:

● Exploiting patch-based representation, artifacts may appear in case of boundaries, which could be dilated throughout decompression. One could mitigate this issue by adjusting the patch size (clipping some outer grid cells) along boundaries. This would require to store one small integer for each patch, at a small cost.

● Other seed picking strategies could be implemented, for example by placing the seeds so that they minimize the local error, in the spirit of [Ohtake et al. 2006].

● Encoding per-point attribute such as normals and colors is possible with the same similarity-based coder.

Perspectives: Although our algorithm is based on the exploitation of self-similarity on the whole surface, most of the involved treatments remain local. This is a good prospect for handling data of ever increasing size, using streaming processes. This is particularly important at a time when the geometric digitization campaigns sometimes cover entire cities.

Page 36: Image Restoration for 3D Computer Vision

Point cloud Compression Dynamic #1 voxelizedMotion-Compensated Compression of Dynamic Voxelized Point CloudsRicardo L. de Queiroz ; Philip A. ChouIEEE Transactions on Image Processing ( Volume: 26, Issue: 8, Aug. 2017 )https://doi.org/10.1109/TIP.2017.2707807

As a new concept for a new application, much has still to be fine tuned and perfected. For example, the post-processing (in-loop or otherwise) is far from reaching its peak performance. Both the morphological and the filtering operations are not well understood in this context. Similarly, the distortion metrics or the voxel matching methods are not developed to a satisfactory point. There is still plenty of work to be done to extend the present framework to use B-frames (bidirectional prediction) and to extend the GOF to a more typical IBBPBBP... format. Furthermore, we want to use adaptive block sizes, which are optimally selected in an RD sense and we also want to encode both the geometry and the color residues for the predicted (P and B) blocks. Finally, rather than re-using the correspondences from the surface reconstruction among consecutive frames, we want to develop efficient motion estimation methods for use with our coder. Each of these enhancements should improve the coder performance, such that there is a continuous sequence of improvements in this new frontier to be explored.

Page 37: Image Restoration for 3D Computer Vision

Point cloud Compression Dynamic #2Graph-Based Compression of Dynamic 3D Point Cloud SequencesDorina Thanou ; Philip A. Chou ; Pascal FrossardIEEE Transactions on Image Processing ( Volume: 25, Issue: 4, April 2016 )https://doi.org/10.1109/TIP.2016.2529506

Example of a point cloud of the ‘yellow dress’ sequence (a). The geometry is captured by a graph (b) and the r component of the color is considered as a signal on the graph (c). The size and the color of each disc indicate the value of the signal at the corresponding vertex.

Octree decomposition of a 3D model for two different depth levels. The points belonging to each voxel are represented by the same color.

There are a few directions that can be explored in the future. First, it has been shown in our experimental section that a significant part of the bit budget is spent for the compression of the 3D geometry, which given a particular depth of the octree, is lossless. A lossy compression scheme that permits some errors in the reconstruction of the geometry could bring non-negligible benefits in terms of the overall rate-distortion performance. Second, the optimal bit allocation between geometry, color and motion vector data stays an interesting and open research problem, due mainly to the lack of a suitable metric that balances geometry and color visual quality. Third, the estimation of the motion is done by computing features based on the spectral graph wavelet transform. Features based on data-driven dictionaries, such as the ones proposed in [Thanou et al. 2014], are expected to increase significantly the matching, and consequently the compression performance.

Page 38: Image Restoration for 3D Computer Vision

Dynamic meshes laplace operatorA 3D+t Laplace operator for temporal mesh sequencesVictoria Fernández Abrevaya , Sandeep Manandhar, Franck Hétroy-Wheeler, Stefanie WuhrerComputers & Graphics Volume 58, August 2016, Pages 12-22https://doi.org/10.1016/j.cag.2016.05.018

In this paper we have introduced a discrete Laplace operator for temporally coherent mesh sequences. This operator is defined by modelling the sequences as CW complexes in a 4-dimensional Riemaniann space and using Discrete Exterior Calculus. A userdefined parameter is associated to the 4D space to control the influence of motion αwith respect to the geometry. We have shown that this operator can be expressed by a sparse blockwise tridiagonal matrix, with a linear number of non zero coefficients with respect to the number of vertices in the sequence. The storage overhead with respect to frame-by-frame mesh processing is limited. We have also shown an application example, as-rigid-as-possible editing, for which it is relatively easy to extend the classical static Laplacian framework to mesh sequences with this matrix. Similar results to state-of-the-art methods can be reached with a simple, global formulation.

This opens the possibility of many other problems in animation processing to be tackled the same way by taking advantage of the existing literature on the Laplacian operator for 3D meshes [Zhang et al. 2010]. In the future, we are in particular interested in studying the spectral properties of the defined discrete Laplace operator.

Page 39: Image Restoration for 3D Computer Vision

Point cloud inpainting #1Region of interest (ROI) based 3D inpaintingShankar Setty, Himanshu Shekhar, Uma MudenagudiProceeding SA '16 SIGGRAPH ASIA 2016 Posters Article No. 33 https://doi.org/10.1145/3005274.3005312

Point Cloud Data Cleaning and Refining for 3D As-Built Modeling of Built Infrastructure Abbas Rashidi and Ioannis BrilakisConstruction Research Congress 2016http://sci-hub.cc/10.1061/9780784479827.093

Future experiments will also be required to quantitatively measure the accuracy of the presented algorithms especially for the case of outliers’ removal. Developing robust algorithms for automatically recognizing 3D objects throughout the built infrastructure PCD and therefore enhancing the object oriented modeling stage is another possible direction for future research.

Page 40: Image Restoration for 3D Computer Vision

Point cloud inpainting #2A

Dynamic occlusion detection and inpainting of in situ captured terrestrial laser scanning point clouds sequenceChi Chen, Bisheng YangIEEE Transactions on Image Processing ( Volume: 25, Issue: 4, April 2016 )https://doi.org/10.1016/j.isprsjprs.2016.05.007

In future work, the proposed method will be extended to incorporate multiple geometric features (e.g. shape index, normal vector https://github.com/aboulch/normals_Hough) of local point distributions to measure the geometric consistency in the background modeling stage, aiming for higher recall of the background points during inpainting.

Page 42: Image Restoration for 3D Computer Vision

Point cloud Quality assessment #1Towards Subjective Quality Assessment of Point Cloud Imaging in Augmented RealityAlexiou, Evangelos; Upenik, Evgeniy; Ebrahimi, TouradjIEEE 19th International Workshop on Multimedia Signal Processing, Luton Bedfordshire, United Kingdom, October 16-18, 2017

https://infoscience.epfl.ch/record/230115

On the performance of metrics to predict quality in point cloud representationsAlexiou, Evangelos; Ebrahimi, TouradjSPIE Optics + Photonics for Sustainable Energy, San Diego, California, USA, August 6-10, 2017

https://infoscience.epfl.ch/record/230116

As it can be observed, our results show strong correlation between objective metrics and subjective scores in the presence of Gaussian noise. The statistical analysis shows that the current metrics perform well when Gaussian noise is introduced. However, in the presence of compression-like artifacts the performance is lesser for every type of content, leading to a conclusion that the performance is content dependent. Our results show that there is a need for better objective metrics that can more accurately predict all practical types of distortions for a wide variety of contents.

absolute category rating (ACR) double-stimulus impairement scale (DSIS)

Page 43: Image Restoration for 3D Computer Vision

Point cloud Quality assessment #2A statistical method for geometry inspection from point cloudsFrancisco de Asís López, Celestino Ordóñez, Javier Roca-Pardiñas , Silverio García-CortésApplied Mathematics and Computation Volume 242, 1 September 2014, Pages 562-568https://doi.org/10.1016/j.amc.2014.05.130

Assessing planar asymmetries in shipbuilding from point clouds Javier Roca-Pardiñas , Celestino Ordõnez , Carlos Cabo , Agusín Menéndez-DíazMeasurement Volume 100, March 2017, Pages 252-261https://doi.org/10.1016/j.measurement.2016.12.048

In this paper, a statistic test to perform geometry inspection is described. The methodology used allows, by means of bootstrapping techniques, to obtain a p-value for the statistical hypothesis established.

An important aspect of the developed methodology, proved by means of a simulated experiment, is its capacity to control type I errors while it is able to reject the null hypothesis when it is false. This experiment showed that the performance of the method improves when the point density increases.

The proposed method was applied to the inspection of a parabolic dish antenna, and the results show that it does not fit its theoretical shape, unless a 1 mm tolerance is admitted.

It is noteworthy that although the method has been exposed as a global test for geometry inspection, it would also be possible to apply it to inspect different parts of the object under study.

Yatch hull surface estimated from the point cloud.

Page 44: Image Restoration for 3D Computer Vision

Point cloud Quality assessment #3 Defect detectionAutomated Change Diagnosis of Single-Column-Pier Bridges Based on 3D Imagery DataYing Shi; Wen Xiong, Ph.D., P.E., M.ASCE; Vamsisai Kalasapudi; Chao GengASCE International Workshop on Computing in Civil Engineering 2017http://doi.org/10.1061/9780784480830.012

The future work will include understanding the correlation between the deformation of the girder and column with the change in the thickness of the connected bearing. Such correlated change analysis will aid in understanding the cause of the observed thickness variation and performing reliable condition diagnosis of all the single pier bridges.

Page 45: Image Restoration for 3D Computer Vision

Point cloud Quality assessment #4 with uncertaintyPoint cloud comparison under uncertainty. Application to beam bridge measurement with terrestrial laser scanningFrancisco de Asís López, Celestino Ordóñez, Javier Roca-Pardiñas, Silverio García-CortésMeasurement Volume 51, May 2014, Pages 259-264https://doi.org/10.1016/j.measurement.2014.02.013

Assessment of along-normal uncertainties for application to terrestrial laser scanning surveys of engineering structuresTarvo Mill, Artu EllmannSurvey Review (2017) Vol. 0 , Iss. 0,0

http://dx.doi.org/10.1080/00396265.2017.1361565

Future studies should more closely investigate the dependence of results of different TLS signal processing methods and also applicability of combined standard uncertainty (CSU, Bjerhammar 1973;

Niemeier and Tengen 2017), equations considering also systematic error in TLS surveys.

The application of the proposed methodology to compare two point clouds of a beam bridge measured with two different scanner systems, showed significant differences in parts of the beam. This is important in inspection works since different conclusions could be reached depending on the measuring instrument.

Page 46: Image Restoration for 3D Computer Vision

PDE-based Point cloud processingPartial Difference Operators on Weighted Graphs for Image Processing on Surfaces and Point CloudsFrançois Lozes ; Abderrahim Elmoataz ; Olivier LézorayIEEE Transactions on Image Processing ( Volume: 23, Issue: 9, Sept. 2014 )https://doi.org/10.1109/TIP.2014.2336548

PDE-Based Graph Signal Processing for 3-D Color Point Clouds : Opportunities for cultural heritageFrançois Lozes ; Abderrahim Elmoataz ; Olivier LézorayIEEE Signal Processing Magazine ( Volume: 32, Issue: 4, July 2015 )https://doi.org/10.1109/MSP.2015.2408631

The approach allows processing of signal data on point clouds (e.g., spectral data, colors, coordinates, and curvatures). We have applied this approach for cultural heritage purposes on examples aimed at restoration, denoising, hole-filling, inpainting, object extraction, and object colorization.

Page 47: Image Restoration for 3D Computer Vision

Sparse coding and point clouds #1Cloud Dictionary: Sparse Coding and Modeling for Point CloudsOr Litany, Tal Remez, Alex Bronstein(Submitted on 15 Dec 2016 (v1), last revised 20 Mar 2017 (this version, v2))https://arxiv.org/abs/1612.04956

Sparse Geometric Representation Through Local Shape ProbingJulie Digne, Sébastien Valette, Raphaëlle Chaine(Submitted on 7 Dec 2016)https://arxiv.org/abs/1612.02261

With the development of range sensors such as LIDAR and time-of-flight cameras, 3D point cloud scans have become ubiquitous in computer vision applications, the most prominent ones being gesture recognition and autonomous driving. Parsimony-based algorithms have shown great success on images and videos where data points are sampled on a regular Cartesian grid. We propose an adaptation of these techniques to irregularly sampled signals by using continuous dictionaries. We present an example application in the form of point cloud denoising

Page 48: Image Restoration for 3D Computer Vision

Building Information models (BIM) and point cloudsAn IFC schema extension and binary serialization format to efficiently integrate point cloud data into building modelsThomas Krijnen, Jakob BeetzAdvanced Engineering Informatics Available online 3 April 2017https://doi.org/10.1016/j.aei.2017.03.008

Building elements, which can be represented by various forms of geometry, including 2D and 3D line drawings, Constructive Solid Geometry (CSG), Boundary Representations (BRep) and tessellated meshes. However, these three-dimensional representations are just one of the many aspects conveyed in an IFC model. In addition, attributes related to thermal or acoustic performance, costing or intended use of spaces etc. can be added.

In many common data formats for the storage of point cloud data, such as E57 and PCD, metadata is attached to individual data sets. This metadata for example includes scanner positions or weather conditions that are perceived during the scan. From the acquisition process, the point data itself contains no grouping, decomposition or other information that relates the points to the semantic meaning of the real-world object that was scanned. In subsequent processing steps such labels are often added to the points. Several exchange formats, such as LAS, have options to store labels along with the points.

The magnitude of the data which is typically found in point cloud data sets and IFC model populations can be dramatically different for the two file types. A meaningful IFC file can have file sizes in the order of a few megabytes, if geometrical representations and property values are properly reused and especially when the file contains implicit, parametric, rather than tessellated geometry. Depending on the amount of detail and precision, point cloud scans can easily amount to gigabytes of data. Despite the larger size, due the uniform structure and explicit nature, point clouds can typically be more immediately explored than IFC building models, for which the boolean operations and implicit geometries need to be evaluated prior to visualization.

The need for a unified and harmonized storage model of the two data types is observed in literature [e.g. Li et al 2008; Golparvar-Fard et al. 2011]. Yet, the authors acknowledge that other use cases will exist in which a deep coupling between building models and point clouds is unnecessary or even undesirable. This paper presents an extension to the IFC schema by which an open and semantically rich standard arises.

Future: One of the core advantages of the HDF5 format is the usage of transparent block-level compression. HDF5 allows several compression schemes, including user-defined compression methods. These would allow much higher compression ratios by exploiting structural knowledge of the point cloud or by introducing additional lossiness in the compression methods. In the prototypical implementation only gzip compression is used. Especially the point clouds segments stored as height maps projected on parametric surfaces might be suitable for specific-purpose compression methods, such as jpeg or png, which can exploit and filter imperceivable differences.

Lastly, future research will indicate how the associated point cloud structure presented in this paper can be paired with other spatial indexing structures to further advance the localized extraction of point cloud segments and spatial querying techniques. Further experiments will be conducted to harness and reuse the general purpose decomposition and aggregation relationships of the IFC to implement octrees and kd-trees to further enhance the structure and accessibility of the data.

Page 49: Image Restoration for 3D Computer Vision

Dynamic Surface Mesh Detail enhancement #1Multi-scale geometric detail enhancement for time-varying surfacesGraphical Models Volume 76, Issue 5, September 2014, Pages 413-425https://doi.org/10.1016/j.gmod.2014.03.010

We first develop an adaptive spatio-temporal bilateral filter, which produces temporally-coherent and feature-preserving multi-scale representation for the time-varying surfaces. We then extract the geometric details from the time-varying surfaces, and enhance geometric details by exaggerating detail information at each scale across the time-varying surfaces.

Velocity vectors estimation. The top row gives 4 frames in the time-varying surfaces, and the bottom row gives the corresponding velocity vectors for each frame

Multi-scale representation and detail enhancement for time-varying surfaces. First row: Input time-varying surfaces, second row: multi-scale filtering results by filtering each frame individually, third rows: multi-scale filtering results using adaptive spatial–temporal filter, fourth and fifth rows: multi-scale detail enhancement results using 6 levels and 9 detail levels, respectively.

Limitations: In our current detail transfer results, we only transfer the detail of a static model to time-varying surfaces. Our current algorithm cannot transfer the geometry detail of time-varying surfaces to target time-varying surfaces, which is challenging since it is difficult to build

the corresponding mapping between the source and target time-varying surfaces with different surface frames.

Another problem is that although our filtering and enhancement methods can alleviate the jittering artifacts, for input time-varying surfaces with heavy jittering, the jittering artifacts still cannot be

removed completely. Processing surface sequences with heavy jittering is a very hard problem, which requires further sophisticated

investigation.

Page 50: Image Restoration for 3D Computer Vision

Surface reconstruction Data Priors #1Surface reconstruction with data-driven exemplar priorsOussama Remil, Qian Xie, Xingyu Xie, Kai Xu, Jun WangComputer-Aided Design Volume 88, July 2017, Pages 31-41https://doi.org/10.1016/j.cad.2017.04.004

Given a noisy and sparse point cloud of structural complex mechanical part as input, our system produces the consolidated points by aligning exemplar priors learned from a mechanical shape database. With the additional information such as normals carried by our exemplar priors, our method achieves better feature preservation than direct reconstruction on the input point cloud (e.g., Poisson).

An overview of our algorithm. We extract priors from a 3D shape database within the same category (e.g., mechanical parts) to construct a prior library. The affinity propagation clustering method is then performed on the prior library to obtain the set of representative priors, called the exemplar priors. Given an input point cloud, we construct its local neighborhoods and perform priors matching to find the similar exemplar prior to each local neighborhood. Subsequently, we utilize the matched exemplar priors to consolidate the input point scan through an augmentation procedure, with which we can generate the faithful surface where sharp features and fine details are well recovered.

Limitations Our method is expected to behave well with different shape categories, meanwhile there are a few limitations that have to be discussed so far. Our algorithm fails when dealing with more challenging repositories with small number of redundant elements, such as complex organic shapes. In addition, if there are large holes within the input scans or big missing parts, our method may fail to complete them based on the “matching-to-alignment” strategy.

Page 51: Image Restoration for 3D Computer Vision

Surface reconstruction Data Priors #2A

3D Reconstruction Supported by Gaussian Process Latent Variable Model Shape PriorsJens Krenzin, Olaf HellwichPFG – Journal of Photogrammetry, Remote Sensing and Geoinformation ScienceMay 2017, Volume 85, Issue 2, pp 97–112https://doi.org/10.1007/s41064-017-0009-0

A 2D shape representing a filled circle, where black represents the outside of the object and white represents the inside of the object. b Corresponding signed distance function (SDF) for the shape shown in a. The 0-level is highlighted in red. c Discrete Cosine Transform (DCT) coefficients for the SDF shown in b. The first 15 DCT coefficients in each dimension store the important information about the shape. The remaining coefficients are nearly zero

Page 52: Image Restoration for 3D Computer Vision

Surface reconstruction Data Priors #2B

3D Reconstruction Supported by Gaussian Process Latent Variable Model Shape PriorsJens Krenzin, Olaf HellwichPFG – Journal of Photogrammetry, Remote Sensing and Geoinformation ScienceMay 2017, Volume 85, Issue 2, pp 97–112https://doi.org/10.1007/s41064-017-0009-0

Results for object A—cup. a Sample image. b Erroneous point cloud. c Ground truth. d Shape prior. e Corrected point cloud

This article presents a method that removes outliers, reduces noise and fills holes in a point cloud using a learned shape prior. The shape prior is learned from a set of training objects using the GP-LVM.

It has been shown that an interpolated shape between several training shapes often has ringing artefacts due to the DCT compression step. Several investigations were made on how these artefacts could be reduced. In the first investigation, the difference between the training shapes was reduced and the latent space became denser. As expected this reduced the Euclidean distance from one training example to the nearest training example. The closer two points are in the latent space, the more similar the corresponding shapes are. As a result of this the artefacts are reduced, but only slightly.

In the second investigation, the DCT compression step was removed. The GP-LVM then learns a lower dimensional subspace directly on the SDF. It has been shown that this leads also to a slight reduction of the artefacts of the reconstructed shape, but the artefacts are still visible. In this work the GP-LVM was investigated as a candidate fulfilling the requirements. It has been shown that the number of shape parameters can be reduced, and that the model can be trained for specific object classes. Some of the experiments, related to model sparsity and well-behavedness, have discovered weaknesses of the presented method. Theseissues will be further investigated in future work.

Page 54: Image Restoration for 3D Computer Vision

Depth MAP Inpainting #1Kinect depth inpainting in real timeLucian Petrescu ; Anca Morar ; Florica Moldoveanu ; Alin MoldoveanuTelecommunications and Signal Processing (TSP), 2016https://doi.org/10.1109/TSP.2016.7760974

Example of output from median filter: A) input depth map where black pixels are not sampled; B) output image after applying the median filter; C) difference between input and output: grayscale – sampled pixel, blue – inpainted; D) confidence: blue-filtered, white-sampled, red –unfiltered.

Page 55: Image Restoration for 3D Computer Vision

Depth MAP Inpainting #2A new method for inpainting of depth maps from time-of-flight sensors based on a modified closing by reconstruction algorithmJournal of Visual Communication and Image RepresentationVolume 47, August 2017, Pages 36-47https://doi.org/10.1016/j.jvcir.2017.05.003

This procedure uses a modified morphological closing by reconstruction algorithm.

Finally, the proposed method works properly in depth maps where there is a sufficient good definition of regions or at least the enough to be able to infer the missing information, e.g., depth maps obtained in indoor scenarios or acquired with sensors or methods that achieve these characteristics. Low-quality depth maps and those acquired in outdoor conditions may require additional pre-processing stages or even more robust methods because of the size of the holes presented in such images seems to be larger.

Filling Kinect depth holes via position-guided matrix completionZhongyuan Wang, Xiaowei Song , ShiZheng Wang, Jing Xiao, Rui Zhong, Ruimin HuNeurocomputing Volume 215, 26 November 2016, Pages 48-52https://doi.org/10.1016/j.neucom.2015.05.146

Page 56: Image Restoration for 3D Computer Vision

Depth MAP Inpainting #3Learning-based super-resolution with applications to intensity and depth imagesHaoheng Zheng, University of Wollongong, Doctor of Philosophy thesis, School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, 2014. http://ro.uow.edu.au/theses/4284

Geometric Inpainting of 3D StructuresPratyush Sahay, A. N. RajagopalanThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2015, pp. 1-7https://doi.org/10.1109/CVPRW.2015.7301388

the proposed framework, albeit with occasional minor local artifacts.

“Low-rank Theory”[184] Candes et al. (2011) “Robust principal component analysis?” Journal of the ACM (JACM) Volume 58 Issue 3, May 2011 https://doi.org/10.1145/1970392.1970395

Page 57: Image Restoration for 3D Computer Vision

Depth MAP super-resolution #1Depth map super resolutionMurat Gevrekci ; Kubilay PakinImage Processing (ICIP), 2011 18th IEEEhttps://doi.org/10.1109/ICIP.2011.6116454

Depth map acquisation with ToF camera with different integration times. Image on the left is captured with 10ms integration time. Note how the background depth information is noisy. Image on right is captured with 50ms integration. Background depth is captured reliably at the expense of saturating the near field with high integration time.

We propose changing our constraint sets not only on a single range image but on differently exposed range images to increase depth resolution within whole work space. This concept has resemblance to High Dynamic Range (HDR) [Gevrekci and Gunturk 2007] image formation using differently exposed images. Proposed algorithm will merge useful depth information from different levels and eliminate contaminated data (i.e. saturation, noise).

Modeling the imaging pipeline is a critical step in image enhancement as demonstrated by the author [Gevrekci and Gunturk 2005]. We propose modeling the depth map as a function of internal camera parameters, object and camera motion, and photometric changes due to camera response function and alternating integration time.

Spatially Adaptive Tensor Total Variation-Tikhonov Model for Depth Image Super ResolutionGang Zhong ; Sen Xiang ; Peng Zhou ; Li YuIEEE Access ( Volume: 5, 2017 )https://doi.org/10.1109/ACCESS.2017.2715981

Visual comparison of 4× super resolution results on our synthetic scene Chess: (a) the groundtruth depth image. Super resolution results of (b) using Tikhonov regularization.(c) using total variation regularization. (d) using color guided tensor total variation regularization. (e) using fused edge map guided tensor total variation regularization. (f) using the spatially adaptive tensor total variation - Tikhonov regularization.

Page 58: Image Restoration for 3D Computer Vision

Depth MAP super-resolution #2Image-guided ToF depth upsampling: a surveyIván Eichhardt, Dmitry Chetverikov, Zsolt JankóMachine Vision and Applications May 2017, Volume 28, Issue 3–4, pp 267–282https://doi.org/10.1007/s00138-017-0831-9

Effect of imprecise calibration on depth upsampling. The discrepancy between the input depth and colour images is 2, 5 and 10 pixels, respectively

Effect of optical radial distortion on depth upsampling

Page 59: Image Restoration for 3D Computer Vision

Depth Map super-resolution #3Super-resolution Reconstruction for Binocular 3D DataWei-Tsung Hsiao ; Jing-Jang Leou ; Han-Hui HsiaoPattern Recognition (ICPR), 2014https://doi.org/10.1109/ICPR.2014.721

Depth Superresolution using Motion Adaptive RegularizationUlugbek S. Kamilov, Petros T. Boufounos (Submitted on 4 Mar 2016)https://arxiv.org/abs/1603.01633

Our motion adaptive method recovers a high-resolution depth sequence from high-resolution intensity and low-resolution depth sequences by imposing rank constraints on the depth patches: (a) and (b) t-y slices of the color and depth sequences, respectively, at a fixed x; (c)–(e) x-y slices at t1 = 10; (f)–(h) x-y slices at t2 = 40; (c) and (f) input color images; (d) and (g) input low-resolution and noisy depth images; (e) and (h) estimated depth images.

Illustration of the block matching within a space-time search area. The area in the current frame t is centered at the reference patch. Search is also conducted in the same window position in multiple temporally adjacent frames. Similar patches are grouped together to construct a block p = Bp .β φ

Visual evaluation on Road video sequence. Estimation of depth from its 3× downsized version at 30 dB input SNR. Row 1 shows the data at time instance t = 9. Row 2 shows the data at the time instance t = 47. Row 3 shows the t-y profile of the data at x = 64. Highlights indicate some of the areas where depth estimated by GDS-3D recovers details missing in the depth estimate of DS-3D that does not use intensity information.

Page 60: Image Restoration for 3D Computer Vision

Depth Map super-resolution #4Depth Map Restoration From Undersampled DataSrimanta Mandal ; Arnav Bhavsar ; Anil Kumar SaoIEEE Transactions on Image Processing ( Volume: 26, Issue: 1, Jan. 2017 )https://doi.org/10.1109/TIP.2016.2621410

The objective of the paper: (a) Uniform up-sampling of an LR depth map i.e., filling up missing information in an HR grid generated from a uniformly sampled LR depth map – can be addressed by SR (b) Non-uniform up-sampling a sparse point cloud i.e., filling up the missing information in a randomly filled HR grid – can be addressed by PCC, (c) An extreme case of non-uniform up-sampling, where very less data is available. We suggest an approach wherein this is interpreted as non-uniform up-sampling followed by uniform up-sampling – can be addressed by PCC-SR.

We have addressed the problem of depth restoration by up-

sampling either the uniformly sampled LR

depth map or sparse non-uniformly

sampled point cloud in a unified sparse representation

framework.

Page 61: Image Restoration for 3D Computer Vision

Depth MAP Joint Superresolution-Inpainting #1Range map superresolution-inpainting, and reconstruction from sparse dataComputer Vision and Image Understanding Volume 116, Issue 4, April 2012, Pages 572-591https://doi.org/10.1016/j.cviu.2011.12.005

Depth map inpainting and super-resolution based on internal statistics of geometry and appearanceSatoshi Ikehata ; Ji-Ho Cho ; Kiyoharu AizawaImage Processing (ICIP), 2013 20th IEEEhttps://doi.org/10.1109/ICIP.2013.6738194

In this paper, we have proposed depth-map inpainting and super-resolution algorithms which explicitly capture the internal statistics of a depth-map and its registered texture image and have demonstrated their state-of-the-art performance. The current limitation is that we have assumed the accurate registration of the texture image and have not assumed the presence of sensor noise. In future work, we will evaluate our method’s robustness to these problems to assess its handling of more practical situations.

Range image expansion and inpainting. (a and d) LR images with missing data for the apple and birdhouse datasets. (b and e) Interpolated images with missing data. (c and f) Range expansion with inpainting using the proposed method. 3D reconstructions with light-rendering and gray-scale representation, respectively, for (g and h) apple and (i and j) birdhouse.

Range expansion with inpainting across different objects. (a) Interpolated range observation. (b) Corresponding HR and inpainted range output using the proposed method. (c–e) Unlinked, Linked and residual edge maps, respectively, which are used to restrict the smoothness across edges.

Effect of noise on edge-linking. (a) Noisy observation. (e) Corresponding HR and inpainted output (b–d) Unlinked, linked and residual edges when no noise is added in the observation. (f–h) Unlinked, linked and residual edges for the observation in (a).

Page 62: Image Restoration for 3D Computer Vision

Depth MAP Joint Superresolution-Inpainting #2Superpixel-based depth map enhancement and hole filling for view interpolationProceedings Volume 10420, Ninth International Conference on Digital Image Processing (ICDIP 2017); 104202O (2017)http://dx.doi.org/10.1117/12.2281544

Depth enhancement with improved exemplar-based inpainting and joint trilateral guided filteringLiang Zhang ; Peiyi Shen ; Shu'e Zhang ; Juan Song ; Guangming ZhuImage Processing (ICIP), 2016 IEEEhttps://doi.org/10.1109/ICIP.2016.7533131

Superpixel-based initial depth map refinement: (a) superpixel segmentation of the color image, (b) initial depth map segmentation using the same superpixel label as (a), (c) initial depth map before refinement, (d) enhanced depth map of (c).

Superpixel-based warped depth map hole filling: (a) and (b) are superpixels with hole regions, (c) and (d) are hole filling results of (a) and (b), respectively.

In this paper, we propose an efficient superpixel-based depth information processing method for view interpolation. First of all, the color image is segmented into superpixels using SLIC algorithm, and the associated initial depth map is segmented with the same label. After that, the depth-missing pixels are recovered by considering the color and depth superpixels jointly. Furthermore, the holes caused by disocclusion in the warped depth map can also be filled in superpixel domain. Experimental results demonstrate that with the incorporation of the proposed initial depth map enhancement and warped depth map hole filling method, better view interpolation performances have been achieved.

Page 64: Image Restoration for 3D Computer Vision

Image restoration Loss functions & Quality metrics #1A

Loss Functions for Image Restoration With Neural NetworksHang Zhao ; Orazio Gallo ; Iuri Frosio ; Jan Kautz NVIDIA, MIT Media LabIEEE Transactions on Computational Imaging ( Volume: 3, Issue: 1, March 2017 )https://doi.org/10.1109/TCI.2016.2644865

The loss layer, despite being the effective driver of the network’s learning, has attracted little attention within the image processing research community: the choice of the cost function generally defaults to the squared l

2 norm of

the error [Jain et al. 2009; Burger et al. 2012; Dong et al. 2014; Wang 2014]. This is understandable, given the many desirable properties this norm possesses. There is also a less well-founded, but just as relevant reason for the continued popularity of l

2: standard neural networks

packages, such as Caffe, only offer the implementation for this metric.

However, l2 suffers from well-known limitations. For

instance, when the task at hand involves image quality, correlates poorly with image quality as perceived by a human observer [Zhang et al. 2012]. This is because of a number of assumptions implicitly made when using l

2 .

First and foremost, the use of l2 assumes that the impact

of noise is independent of the local characteristics of the image. On the contrary, the sensitivity of the Human Visual System (HVS) to noise depends on local luminance, contrast, and structure [Wang et al. 2004]. The l2 loss also works under the assumption of white

Gaussian noise, which is not valid in general [e.g. Wang, and Bovik 2009].

We focus on the use of neural networks for image restoration tasks, and we study the effect of different metrics for the network’s loss layer. We compare l

2 against four error metrics on representative tasks: image super-resolution, JPEG artifacts removal, and joint

denoising plus demosaicking. First, we test whether a different local metric such as l1 can produce better results. We then evaluate

the impact of perceptually-motivated metrics. We use two state-of-the-art metrics for image quality: the structural similarity index (SSIM [Wang et al. 2004]) and the multiscale structural similarity index (MS-SSIM [Wang et al. 2003]). We choose these among the plethora of existing indexes, because they are established measures, and because they are differentiable—a requirement for the backpropagation stage. As expected, on the use cases we consider, the perceptual metrics outperform l

2 . However, and perhaps

surprisingly, this is also true for l1 , see Figure 1. Inspired by this observation, we propose a novel loss function and show its superior

performance in terms of all the metrics we consider.

Page 65: Image Restoration for 3D Computer Vision

Image restoration Loss functions & Quality metrics #1b

However, it is widely accepted that l2, and

consequently the Peak Signal-to-Noise Ratio, PSNR, do not correlate well with human’s perception of image quality l

2 simply does not

capture the intricate characteristics of the human visual system (HVS).

There exists a rich literature of error measures, both reference-based and non reference-based, that attempt to address the limitations of the simple l

2 error function. For our purposes, we focus

on reference-based measures. A popular reference-based index is the structural similarity index (SSIM). SSIM evaluates images accounting for the fact that the HVS is sensitive to changes in local structure. Wang et al. 2003 extend SSIM observing that the scale at which local structure should be analyzed is a function of factors such as image-to-observer distance. To account for these factors, they propose MS-SSIM, a multi-scale version of SSIM that weighs SSIM computed at different scales according to the sensitivity of the HVS. Experimental results have shown the superiority of SSIM-based indexes over l

2. As a

consequence, SSIM has been widely employed as a metric to evaluate image processing algorithms. Moreover, given that it can be used as a differentiable cost function, SSIM has also been used in iterative algorithms designed for image compression [Wang, and Bovik 2009], image reconstruction [Brunet et al. 2010], denoising and super-resolution [Rehman et al. 2012], and even downscaling [Öztireli and Gross 2015]. To the best of our knowledge, however, SSIM-based indexes have never been adopted to train neural networks.

Recently, novel image quality indexes based on the properties of the HVS showed improved performance when compared to SSIM and MS-SSIM. One of these is the Information Weigthed SSIM (IW-SSIM), a modification of MS-SSIM that also includes a weighting scheme proportional to the local image information [Wang and Li 2011]. Another is the Visual Information Fidelity (VIF), which is based on the amount of shared information between the reference and distorted image [Sheikh and Bovik 2006]. The Gradient Magnitude Similarity Deviation (GMSD) is characterized by simplified math and performance similar to that of SSIM, but it requires computing the standard deviation over the whole image [Xue et al. 2014]. Finally, the Feature Similarity Index (FSIM), leverages the perceptual importance of phase congruency, and measures the dissimilarity between two images based on local phase congruency and gradient magnitude [Zhang et al. 2011]. FSIM has also been extended to FSIMc, which can be used with color images. Despite the fact that they offer an improved accuracy in terms of image quality, the mathematical formulation of these indexes is generally more complex than SSIM and MS-SSIM, and possibly not differentiable, making their adoption for optimization procedures not immediate.

Page 66: Image Restoration for 3D Computer Vision

Point cloud transformations

Numerical geometry of non-rigid shapes Michael Bronsteinhttp://slideplayer.com/slide/4925779/

Left Intrinsic vs. Extrinsic properties of shapes. Top left: Original shape. Top Right: Reconstructed shape from geometry image with cut edges displayed in red. The middle and bottom rows show the geometry image encoding the y coordinates and HKS, respectively of two spherical parameterizations (left and right). The two spherical parameterizations are symmetrically rotated by 180 degrees along the Y-axis. The geometry images for Y-coordinate display an axial as well as intensity flip. Whereas, the geometry images for HKS only display an axial flip. This is because HKS is an intrinsic shape signature (geodesics are persevered) whereas point coordinates on a shape surface are not. Center Intrinsic descriptors (here the HKS) are invariant to shape articulations. Right Padding structure of geometry images: The geometry images for the 3 coordinates are replicated to produce a 3× 3 grid. The center image in each grid corresponds to the original geometry image. Observe no discontinuities exist along the grid edges. Sinha et al. (2016)

Left Geometry images created by fixing the polar axis of a hand (top) and aeroplane (bottom), and rotating the spherical parametrization by equal intervals along the axis. The cut is highlighted in red. Center Four rotated geometry images for a different cut location highlighted in red. The plots to the right show padded geometry images wherein the similarity across rotated geometry images are more evident and the five finger features coherently visible Right Changing the viewing direction for a cut inverts the geometry image. The similarity in geometry images for the two diametrically opposite cuts emerges when we pad the image in a 3×3 grid Sinha et al. (2016)

Authalic vs Conformal parametrization: (Left to right) 2500 vertices of the hand mesh are color coded in the first two plots. A 64× 64 geometry image is created by uniformly sampling a parametrization, and then interpolating the nearby feature values. Authalic geometry image encodes all tip features. Conformal parametrization compress high curvature points to dense regions [Gu et al. 2003]. Hence, finger tips are all mapped to a very small regions. The fourth plot shows that the resolution of geometry image is insufficient to capture the tip feature colors in conformal parametrization. This is validated by reconstructing shape from geometry images encoding x, y, z locations for both parameterizations in final two plots.

Page 67: Image Restoration for 3D Computer Vision

2D super-resolution techniques for Geometry imagesMemNet: A Persistent Memory Network for Image RestorationYing Tai, Jian Yang, Xiaoming Liu, Chunyan Xu(Submitted on 7 Aug 2017)https://arxiv.org/abs/1708.02209https://github.com/tyshiwo/MemNet

The same MemNet structure achieves the state-of-the-art performance in image denoising, super-resolution and JPEG deblocking. Due to the strong learning ability, our MemNet can be trained to handle different levels of corruption even using a single model.

CVAE-GAN: Fine-Grained Image Generation through Asymmetric TrainingJianmin Bao, Dong Chen, Fang Wen, Houqiang Li, Gang Hua(Submitted on 29 Mar 2017)https://arxiv.org/abs/1703.10155https://github.com/tatsy/keras-generative

The proposed method can support a wide variety of applications, including image generation, attribute morphing, image inpainting, and data augmentation for training better face recognition models

Page 68: Image Restoration for 3D Computer Vision

Surfaces segmentation and correspondenceConvolutional Neural Networks on Surfaces via Seamless Toric CoversHaggai Maron, Meirav Galun, Noam Aigerman, Miri Trope, Nadav Dym Ersin Yumer, Vladimir G. Kim, Yaron Lipman | Weizmann Institute of Science, Adobe ResearchACM Transactions on Graphics (TOG) Volume 36 Issue 4, July 2017 Article No. 71http://dx.doi.org/10.1145/3072959.3073616

Parameterization produced by the geometry image method of [Sinha et al. 2016]; the parameterization is not seamless as the isolines break at the dashed image boundary (right); although the parameterization preserves area it produces large variability in shape.

Computing the flat-torus structure (middle) on a 4-cover of a spheretype surface (le!) defined by prescribing three points (colored disks). The right inset shows the flat-torus resulted from a di#erent triplet choice

Visualization of “easy” functions on the surface (top-row) and their pushed version on the flat-torus (bottom-row). We show three examples of functions we use as input to the network: (a) average geodesic distance (left), (b) the x component of the surface normal (middle), and (c) Wave Kernel Signature [Aubry et al. 2011]. The blowup shows the face area, illustrating that the input functions capture relevant information in the shape.

Experiments show that our method is able to learn and generalize semantic functions better than state of the art geometric learning approaches in segmentation tasks. Furthermore, it can use only basic local data (Euclidean coordinates, curvature, normals) to achieve high success rate, demonstrating ability to learn high-level features from a low-level signal. This is the key advantage of defining a local translation invariant convolution operator. Finally, it is easy to implement and is fully compatible with current standard CNN implementations for images.

A limitation of our technique is that it assumes the input shape is a mesh with a sphere-like topology. An interesting direction for future work is extending our method to meshes with arbitrary topologies. This problem is especially interesting since in certain cases shapes from the same semantic class may have different genus. Another limitation is that currently aggregation is done as a separate post-process step and not as a part of the CNN optimization. An interesting future work in this regard is to incorporate the aggregation in the learning stage and produce end-to-end learning framework.

Page 70: Image Restoration for 3D Computer Vision

Point clouds Classification and segmentationPointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric SpaceCharles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas | Stanford University(Submitted on 7 Jun 2017)https://arxiv.org/abs/1706.02413https://github.com/charlesq34/pointnet2 (TensorFlow)

Shapes in SHREC15 are 2D surfaces embedded in 3D space. Geodesic distances along the surfaces naturally induce a metric space. We show through experiments that adopting PointNet++ in this metric space is an effective way to capture intrinsic structure of the underlying point set

We follow Rustamov et al. (2009) to obtain an embedding metric that mimics geodesic distance. Next we extract intrinsic point features in this metric space including Wave Kernel Signature (WKS) [Aubry et al. 2011], Heat Kernel Signature (HKS) [Sun et al. 2009] and multi-scale Gaussian curvature [Meyer et al. 2003].

We use these features as input and then sample and group points according to the underlying metric space. In this way, our network learns to capture multi-scale intrinsic structure that is not influenced by the specific pose of a shape. Alternative design choices include using XYZ coordinates as points feature or use Euclidean space R3 as the underlying metric space. We show below these are not optimal choices.

Aubry et al. 2011

Aubry et al. 2011

Aubry et al. 2011

Aubry et al. 2011

Page 71: Image Restoration for 3D Computer Vision

Point clouds Novel descriptorsLearning Compact Geometric FeaturesMarc Khoury, Qian-Yi Zhou, Vladlen Koltun(Submitted on 15 Sep 2017)https://arxiv.org/abs/1706.02413

We present an approach to learning features that represent the local geometry around a point in an unstructured point cloud. Such features play a central role in geometric registration, which supports diverse applications in robotics and 3D vision.

The presented approach yields a family of features, parameterized by dimension, that are both more compact and more accurate than existing descriptors.

Background The development of geometric descriptors for rigid alignment of unstructured point clouds dates back  to  the 90s. Classic descriptors  include Spin  Images  [Johnson and Hebert 1999]  and 3D Shape Context  [

Frome et al. 2004]. More recent work introduced Point Feature Histograms (PFH)  [Rusu et al. 2008], Fast Point Feature Histograms  (FPFH)  [Rusu et al. 2009],  Signature  of  Histogram  Orientations  (SHOT)  [Salti et al. 2014],  and  Unique Shape Contexts (USC) [Tombari et al. 2010]. 

A comprehensive evaluation of existing local geometric descriptors is reported by Guo et al. 2016

The learned descriptor is both more precise and more compact than handcrafted features. Due to its Euclidean structure, the learned descriptor can be used as a drop-in replacement for existing features in robotics, 3D vision, and computer graphics applications. We expect future work to further improve precision, compactness, and robustness, possibly using new approaches to optimizing feature embeddings [Ustinova and Lempitsky 2016, https://github.com/madkn/HistogramLoss, https://youtu.be/_N1qYrv321E].

Page 72: Image Restoration for 3D Computer Vision

Dense Grid Point clouds generative modelLearning Efficient Point Cloud Generation for Dense 3D Object ReconstructionChen-Hsuan Lin, Chen Kong, Simon Lucey(Submitted on 21 Jun 2017)https://arxiv.org/abs/1706.07036

We use 2D convolutional operations to predict the 3D structure from multiple viewpoints and jointly apply geometric reasoning with 2D projection optimization. We introduce the pseudo-renderer, a differentiable module to approximate the true rendering operation, to synthesize novel depth maps for optimization. Experimental results for single-image 3D object reconstruction tasks show that we outperforms state-of-the-art methods in terms of shape similarity and prediction density.

Network architecture. From an encoded latent representation, we propose to use a structure generator, which is based on 2D convolutional operations, to predict the 3D structure at N viewpoints. The point clouds are fused by transforming the 3D structure at each viewpoint to the canonical coordinates. The pseudo-renderer synthesizes depth images from novel viewpoints, which are further used for joint 2D projection optimization. This contains no learnable parameters and reasons based purely on 3D geometry

Concept of pseudo-rendering. Multiple transformed 3D points may correspond to projection on the same pixels in the image space. (a) Collision could easily occur if were directly discretized. (b) Upsampling the target image increases the precision of the projection locations and thus alleviates the collision effect. A max-pooling operation on the inverse depth values follows as to obtain the original resolution while maintaining the effective depth value at each pixel. (c) Examples of pseudo-rendered depth images with various upsampling factors U (only valid depth values without collision are shown). Pseudo-rendering achieves closer performance to true rendering with a higher value of U.

Page 73: Image Restoration for 3D Computer Vision

Point clouds GAN #1A

Representation Learning and Adversarial Generation of 3D Point CloudsPanos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, Leonidas Guibas Same last author as for PointNet++(Submitted on 8 Jul 2017)https://arxiv.org/abs/1707.02392

Editing parts in point clouds using vector arithmetic on the autoencoder (AE) latent space. Left to right: tuning the appearance of cars towards the shape of convertibles, adding armrests to chairs, removing handle from mug.

We build an end-to-end pipeline for 3D point clouds that uses an AE to create a latent representation, and a GAN to generate new samples in that latent space. Our AE is designed with a structural loss tailored to unordered point clouds. Our learned latent space, while compact, has excellent class-discriminative ability: per our classification results, it outperforms recent GAN-based representations by 4.3%. In addition, the latent space allows for vector arithmetic, which we apply in a number of shape editing scenarios, such as interpolation and structural manipulation

We argue that jointly learning the representation and training the GAN is unnecessary for our modality. We propose a workflow that first learns a representation by training an AE with a compact bottleneck layer, then trains a plain GAN in that fixed latent representation. One benefit of this approach is that AEs are a mature technology: training them is much easier and they are compatible with more architectures than GANs.

We point to theory [Arjovsky and Bottou. 2017] that supports this idea, and verify it empirically: we show that GANs trained in our learned AE-based latent space generate visibly improved results, even with a generator and discriminator as shallow as a single hidden layer. Within a handful of epochs, we generate geometries that are recognized in their right object class at a rate close to that of ground truth data. Importantly, we report significantly better diversity measures (10x divergence reduction) over the state of the art, establishing that we cover more of the original data distribution. In summary, we contribute

● An effective cross-category AE-based latent representation on point clouds.

● The first (monolithic) GAN architecture operating on 3D point clouds.

● A surprisingly simpler, state-of-the-art GAN working in the AE’s latent space.

Page 74: Image Restoration for 3D Computer Vision

Point clouds GAN #1B

Raw point cloud GAN (r-GAN). The first version of

our generative model operates directly on the

raw 2048 × 3 point set input

512-dimensional noise vector

Finally, training a GAN in the latent space is much faster and much more stable. The inset provides some intuition with a toy example, where the data live in a 1D circular manifold. The density in red is the result of training a GAN’s generator in the original, 2D, data space. The most commonly used GAN objectives are equivalent to minimizing the Jensen-Shannon divergence (JSD) between the generator and data distributions. Unfortunately, the JSD is part of a family of divergences that become unbounded when there is support mismatch, which is the case in the example: the GAN places a lot of mass outside the data manifold. On the other hand, when training a small GAN in the fixed latent space of a trained AE (blue), the overlap of the two distributions increases significantly. According to recent theoretical advances [Arjovsky and Bottou. 2017] this should improve stability.

Latent-space GAN (l-GAN). In our latentspace GAN (here, l-GAN ), instead of operating on the raw point cloud input, we pass the data through our pre-trained autoencoder, trained separately for each object class with the earth mover's distance (EMD) loss function. Both the generator and the discriminator of the GAN then operate on the 512-dimensional bottleneck variable of the AE. Finally, once the GAN training is over, the output of the generator is decoded to a point cloud via the AE decoder. The architecture for the l-GAN is significantly simpler than the one of the r-GAN. We found that very shallow designs for both the generator and discriminator (in our case, 1 hidden layer for the generator and 2 for the discriminator) are sufficient to produce realistic results.

An interesting avenue for future work involves further exploring the idea of ingesting point clouds by sorting them lexicographically before applying a 1D convolution. A possibly interesting extension would be to study different 1D orderings that capture locality differently, e.g. Hilbert curves (also known as a Hilbert space-filling curve). We can also aim for convolution operators of higher order (2D and 3D)