1
ShellNet: Efficient Point Cloud Convolutional Neural Networks using Concentric Shells Statistics Zhiyuan Zhang 1 [email protected] Binh-Son Hua 2 [email protected] Sai-Kit Yeung 3 [email protected] 1 Singapore University of Technology and Design 2 The University of Tokyo 3 The Hong Kong University of Science and Technology Contributions ShellConv, a simple yet effective convolution operator for orderless point cloud; ShellNet, an efficient neural network based on ShellConv. The state-of-the-art accuracy and efficiency are achieved on object classification, part segmentation, and semantic segmentation; Quantitative Results Classification results on ModelNet40: Method Core Operator input format OA FPNN 1D Conv. Point 87.5 Vol. CNN 3D Conv. Voxel 89.9 O-CNN Sparse 3D Conv. Octree 90.6 Pointwise Point Conv. Point 86.1 PointNet Point MLP Point 89.2 PointNet++ Multiscale Point MLP Point+Normal 90.7 PointCNN X-Conv Point 92.2 ShellNet (ss=8) ShellConv Point 91.0 ShellNet (ss=16) ShellConv Point 93.1 ShellNet (ss=32) ShellConv Point 93.1 ShellNet (ss=64) ShellConv Point 92.8 Segmentation results: Method ShapeNet ScanNet S3DIS Semantic3D mpIoU OA mIoU mIoU SyncCNN 82.0 - - - SpiderCNN 81.7 - - - SplatNet 83.7 - - - SO-Net 81.0 - - - SGPN 82.8 - 50.4 - PCNN 81.8 - - - KCNet 82.2 - - - 3DmFV-Net 81.0 - - - DGCNN 82.3 - 56.1 - RSNet 81.4 - 56.5 - PointNet 80.4 73.9 47.6 - PointNet++ 81.9 84.5 - - PointCNN 84.6 85.1 65.4 - TMLC-MSR - - - 54.2 DeePr3SS - - - 58.5 SnapNet - - - 59.1 SegCloud - - - 61.3 SPG - - 62.1 73.2 ShellNet 82.8 85.2 66.8 69.4 Timing The accuracy of point cloud classification of different methods over time and epochs.It can achieve over 80% accuracy within two minutes, and reach 90% on the test dataset after only 15 minutes of training. Qualitative Results Object Part Segmentation on ShapeNet: Prediction Ground truth Indoor Scene Semantic Segmentation on S3DIS: Outdoor Scene Semantic Segmentation on Semantic3D: Network Efficiency Methods Params FLOPs Time (Train/Infer) (Train/Infer) PointNet 3.5M 44.0B / 14.7B 0.068s / 0.015s PointNet++ 12.4M 67.9B /26.9B 0.091s / 0.027s 3DmFV 45.77M 48.6B /16.9B 0.101s / 0.039s DGCNN 1.84M 131.4B /44.3B 0.171s / 0.064s PointCNN 0.6M 93.0B /25.3B 0.031s / 0.012s ShellNet 0.48M 15.8B /2.8B 0.066s / 0.023s with small RF 0.48M 9.51B /1.5B 0.025s / 0.011s Trainable parameters, FLOPs and running time comparisons. Compared to other methods, ShellNet is lightweight and fast while being accurate. Reducing the receptive field (small RF) by setting a smaller shell size can make the computation even faster as neighbor query becomes cheaper. Ablation Study (A) (B) (C) (D) 1. Sampling Random Farthest Random Random 2. Shell size Fixed Fixed Dynamic Fixed 3. KNN type xyz xyz xyz Features Accuracy (%) 93.1 93.1 92.7 92.4 Train Time 0.066s 0.078s 0.118s 0.081s Infer. Time 0.023s 0.024s 0.033s 0.029s Experiments with neighbor point sampling. Setting (A) is the default strat- egy. Setting (B), (C), (D) are modified from (A) based on point sampling type, shell size, and neighbor query features. As can be seen, setting (B) – furthest point sampling, (C) – equidistant shells, (D) – latent features for neighborhood construction, produces similar accuracy but training and inference time becomes slower. Acknowledgement The authors acknowledge support from the SUTD Digital Manufacturing and Design Centre funded by the Singapore National Research Foundation,and an internal grant from HKUST (R9429). ShellConv Algorithm 1 ShellConv operator. Input: p, Ω p , {F prev (q ): q Ω p } * Representative point, point set, and previous layer features of point set Output: Convoluted features F p ; {q }←{q - p : q Ω p } * Centralization with p as the center. {F local (q )}←{mlp(q )} * Lift each q to a higher dimensional space. {F (q )}←{[F prev (q ),F local (q )} * Concatenate the local and previous layer features. {S }←{S : q Ω S } * Determine which shell q belongs to according the distances from q to center p. {F (S )}←{maxpool({F (q ): q Ω S }): S } * Get fixed-size feature of each shell by a maxpool over all points in the shell. F p conv({F (S )}) * Perform a 1D convolution with all shell features from inner to outer. return F p ; ShellNet N i = {512, 128, 32},S i = {4, 2, 1},C i = {128, 256, 512} for i =0, 1, 2 Shell Size (number of points contained in each shell) is set to 16 for classification and 8 for segmentation Network is implemented in TensorFlow and run on a NVIDIA GTX 1080 GPU for all experiments The optimization is done with an Adam optimizer with initial learning rate set to 0.001

ShellNet ...ShellNet: EfficientPointCloudConvolutionalNeuralNetworksusingConcentricShellsStatistics Zhiyuan Zhang1 [email protected] Binh-Son Hua2 [email protected]

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ShellNet ...ShellNet: EfficientPointCloudConvolutionalNeuralNetworksusingConcentricShellsStatistics Zhiyuan Zhang1 cszyzhang@gmail.com Binh-Son Hua2 binhson.hua@gmail.com

ShellNet: Efficient Point Cloud Convolutional Neural Networks using Concentric Shells StatisticsZhiyuan Zhang1

[email protected]

Binh-Son Hua2

[email protected]

Sai-Kit Yeung3

[email protected] University of Technology and Design 2The University of Tokyo 3The Hong Kong University of Science and Technology

Contributions• ShellConv, a simple yet effective convolution operator for orderless point cloud;• ShellNet, an efficient neural network based on ShellConv.• The state-of-the-art accuracy and efficiency are achieved on object classification, part segmentation, and semantic segmentation;

Quantitative ResultsClassification results on ModelNet40:

Method Core Operator input format OA

FPNN 1D Conv. Point 87.5Vol. CNN 3D Conv. Voxel 89.9O-CNN Sparse 3D Conv. Octree 90.6Pointwise Point Conv. Point 86.1PointNet Point MLP Point 89.2PointNet++ Multiscale Point MLP Point+Normal 90.7PointCNN X-Conv Point 92.2

ShellNet (ss=8) ShellConv Point 91.0ShellNet (ss=16) ShellConv Point 93.1ShellNet (ss=32) ShellConv Point 93.1ShellNet (ss=64) ShellConv Point 92.8

Segmentation results:

Method ShapeNet ScanNet S3DIS Semantic3DmpIoU OA mIoU mIoU

SyncCNN 82.0 - - -SpiderCNN 81.7 - - -SplatNet 83.7 - - -SO-Net 81.0 - - -SGPN 82.8 - 50.4 -PCNN 81.8 - - -KCNet 82.2 - - -3DmFV-Net 81.0 - - -DGCNN 82.3 - 56.1 -RSNet 81.4 - 56.5 -PointNet 80.4 73.9 47.6 -PointNet++ 81.9 84.5 - -PointCNN 84.6 85.1 65.4 -TMLC-MSR - - - 54.2DeePr3SS - - - 58.5SnapNet - - - 59.1SegCloud - - - 61.3SPG - - 62.1 73.2

ShellNet 82.8 85.2 66.8 69.4

Timing

The accuracy of point cloud classification of different methods over timeand epochs.It can achieve over 80% accuracy within two minutes, andreach 90% on the test dataset after only 15 minutes of training.

Qualitative ResultsObject Part Segmentation on ShapeNet:

Pre

dict

ion

Gro

und

trut

h

Indoor Scene Semantic Segmentation on S3DIS:

Outdoor Scene Semantic Segmentation on Semantic3D:

Network EfficiencyMethods Params FLOPs Time

(Train/Infer) (Train/Infer)

PointNet 3.5M 44.0B / 14.7B 0.068s / 0.015sPointNet++ 12.4M 67.9B /26.9B 0.091s / 0.027s3DmFV 45.77M 48.6B /16.9B 0.101s / 0.039sDGCNN 1.84M 131.4B /44.3B 0.171s / 0.064sPointCNN 0.6M 93.0B /25.3B 0.031s / 0.012s

ShellNet 0.48M 15.8B /2.8B 0.066s / 0.023swith small RF 0.48M 9.51B /1.5B 0.025s / 0.011s

Trainable parameters, FLOPs and running time comparisons. Comparedto other methods, ShellNet is lightweight and fast while being accurate.Reducing the receptive field (small RF) by setting a smaller shell size canmake the computation even faster as neighbor query becomes cheaper.

Ablation Study(A) (B) (C) (D)

1. Sampling Random Farthest Random Random2. Shell size Fixed Fixed Dynamic Fixed3. KNN type xyz xyz xyz Features

Accuracy (%) 93.1 93.1 92.7 92.4Train Time 0.066s 0.078s 0.118s 0.081sInfer. Time 0.023s 0.024s 0.033s 0.029s

Experiments with neighbor point sampling. Setting (A) is the default strat-egy. Setting (B), (C), (D) are modified from (A) based on point samplingtype, shell size, and neighbor query features. As can be seen, setting (B)– furthest point sampling, (C) – equidistant shells, (D) – latent featuresfor neighborhood construction, produces similar accuracy but training andinference time becomes slower.

AcknowledgementThe authors acknowledge support from the SUTD Digital Manufacturing and Design Centre funded by the Singapore National Research Foundation,and an internal grantfrom HKUST (R9429).

ShellConv

Algorithm 1 ShellConv operator.Input: p, Ωp, Fprev(q) : q ∈ Ωp * Representative point, point set, and

previous layer features of point setOutput: Convoluted features Fp;q ← q − p : ∀q ∈ Ωp * Centralization with p as the center.Flocal(q) ← mlp(q) * Lift each q to a higher dimensional space.F (q) ← [Fprev(q), Flocal(q) * Concatenate the local andprevious layer features.S ← S : q ∈ ΩS * Determine which shell q belongs to accordingthe distances from q to center p.F (S) ← maxpool(F (q) : q ∈ ΩS) : ∀S * Get fixed-sizefeature of each shell by a maxpool over all points in the shell.Fp ← conv(F (S)) * Perform a 1D convolution with all shell featuresfrom inner to outer.return Fp;

ShellNet

• Ni = 512, 128, 32, Si = 4, 2, 1, Ci = 128, 256, 512 for i = 0, 1, 2• Shell Size (number of points contained in each shell) is set to 16 for classification and 8 for segmentation• Network is implemented in TensorFlow and run on a NVIDIA GTX 1080 GPU for all experiments• The optimization is done with an Adam optimizer with initial learning rate set to 0.001