点云多尺度编码的单阶段3D目标检测网络
Multiscale encoding for single-stage 3D object detector from point clouds
- 2024年29卷第11期 页码:3417-3432
纸质出版日期: 2024-11-16
DOI: 10.11834/jig.230105
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-11-16 ,
移动端阅览
韩俊博, 胡海洋, 李忠金, 潘开来, 王利红. 2024. 点云多尺度编码的单阶段3D目标检测网络. 中国图象图形学报, 29(11):3417-3432
Han Junbo, Hu Haiyang, Li Zhongjin, Pan Kailai, Wang Lihong. 2024. Multiscale encoding for single-stage 3D object detector from point clouds. Journal of Image and Graphics, 29(11):3417-3432
目的
2
自动引导运输小车(automatic guided vehicles,AGV)在工厂中搬运货物时会沿着规定路线运行,但是在靠近障碍物时只会简单地自动停止,无法感知障碍物的具体位置和大小,为了让AGV小车在复杂的工业场景中检测出各种障碍物,提出了一个点云多尺度编码的单阶段3D目标检测网络(multi-scale encoding for single-stage 3D object detector from point clouds,MSE-SSD)。
方法
2
首先,该网络通过可学习的前景点下采样模块来对原始点云进行下采样,以精确地分割出前景点。其次,将这些前景点送入多抽象尺度特征提取模块进行处理,该模块能够分离出不同抽象尺度的特征图并对它们进行自适应地融合,以减少特征信息的丢失。然后,从特征图中预测出中心点,通过多距离尺度特征聚合模块将中心点周围的前景点按不同距离尺度进行聚合编码,得到语义特征向量。最后,利用中心点和语义特征向量一起预测包围框。
结果
2
MSE-SSD在自定义数据集中进行实验,多个目标的平均精度(average precision,AP)达到了最优,其中,在困难级别下空AGV分类、简单级别下载货AGV分类比排名第2的IA-SSD(learning highly efficient point-based detectors for 3D LiDAR point clouds)高出1.27%、0.08%,在简单级别下工人分类比排名第2的SA-SSD(structure aware single-stage 3D object detection from point cloud)高出0.71%。网络运行在单个RTX 2080Ti GPU上检测速度高达77 帧/s,该速度在所有主流网络中排名第2。将训练好的网络部署在AGV小车搭载的开发板TXR上,检测速度达到了8.6 帧/s。
结论
2
MSE-SSD在AGV小车避障检测方面具有较高的精确性和实时性。
Objective
2
In today’s industrial environment, large-scale automatic production lines are gradually replacing the traditional manual production mode, and the concept of an intelligent factory has also received increasing attention from several enterprises. Among them, automatic guided vehicles (AGVs) are used to replace manual handling of goods in many modern factories. The factory pastes a QR code every two meters on the path of the AGV operation. The central control system of the factory continuously assigns different meanings to each QR code. When the AGV drives on the road of the factory and covers one of the QR codes, the scanning system at the bottom will read the QR code information to determine whether the next step is to turn, accelerate, lift, or unload heavy objects. When hundreds of AGVs in the workshop are running simultaneously, the central control system of the workshop will plan the most efficient path and then transmit the control information to the AGV as the physical terminal through two-dimensional codes to realize the intelligent transportation of goods in the factory. When an obstacle is in front of the AGV, regardless of whether the object will hinder the normal operation of the AGV, the common solution is to provide the AGV with a control signal to stop it when the sensor in front of the AGV detects an object. When the AGV is in an environment with many people or goods in the factory, the working efficiency of the AGV is substantially reduced due to frequent parking. Therefore, providing the AGV with specific information regarding the obstacles ahead is necessary to effectively conduct subsequent obstacle avoidance. Therefore, a multiscale encoding for single-stage 3D object detectors from point clouds (MSE-SSD) is introduced to help AGV detect various obstacles in complex industrial scenes.
Method
2
First, the learnable downsampling module of the foreground points is used to sample the point cloud, and the foreground points are accurately and efficiently obtained from the point cloud. This module can gradually extract the semantic features of the input point cloud through the multiple-layer perceptron operation and quantify the semantic features of the points into the foreground score. The Top-
K
method then selects the first
K
points as the front attractions accordin
g to the foreground score to filter out the front attractions with rich target information. Second, the point cloud space with only the foreground points is sent to the multi-abstract scale feature extraction module. In this module, the point cloud space is compressed into a bird’s-eye view (BEV) after voxelization. During the BEV feature extraction, three abstract scale feature maps are extracted from the convolution layer, and attention is used to adaptively fuse them to generate the final feature map and reduce the loss of feature information caused by two-dimensional BEV. Despite the complex plant environment, the target information is relatively simple and clear. The three abstract scale feature maps can provide the computer with almost all target semantic information. The final feature map is used to predict the heatmap, which is sent to the next module. The multi-distance scale feature aggregation module then obtains the center point of each target from the heatmap and aggregates the foreground points near each center point in the voxel space. The module quickly obtains the foreground points through a voxel query and groups them according to the different distances between them and the center point. When the probability that the foreground point close to the center point belongs to this target is high, the probability that the foreground point far away belongs to the center point target is low. Therefore, networks with different weights are used to encode the groups of foreground points to obtain distance-sensitive multiscale semantic features. Finally, the semantic feature and the center point jointly predict the bounding box, where the center point represents the center coordinate of the bounding box and the semantic feature predicts the confidence, size, and deflection angle of the bounding box.
Result
2
The official data sets KITTI and Waymo are used to evaluate the performance of the model, and the custom data set is then utilized to evaluate the final combat effect of the model. In the KITTI test set, the nine most popular methods at present are compared. MSE-SSD ranked third in detection speed, and the frames per second reached 34. Simultaneously, in the comparison of average precision (AP), MSE-SSD and the most advanced single-stage detector at present were almost the same. In the Waymo verification set, compared with other single-stage detectors, the average accuracy of multiple indicators (pedestrians and bicycles) of MSE-SSD for relatively complex targets ranked first. In the customized data set, the following three targets are detected: empty AGV, loaded AGV, and pedestrian. Under the simple level, the AP of MSE-SSD in the cargo AGV and pedestrian targets is 0.08% and 0.71% higher than the second, respectively. At this difficulty level, the AP of MSE-SSD is 1.27% higher than the second in the empty AGV target. Simultaneously, the detection speed of MSE-SSD reached the second level at 65 frame/s. The trained network is deployed on the TXR demoboard carried by the AGV car, and the detection speed reached 7.3 frame/s.
Conclusion
2
Considering the transportation problem in the industrial scene, an obstacle avoidance detection method for AGV is introduced based on two point cloud scales. This method has high detection accuracy and speed and provides a detection guarantee for AGV when running on mobile devices.
3D目标检测单阶段检测网络点云下采样点云特征提取点云特征聚合
3D object detectionsingle-stage detectorpoint cloud down-samplingpoint cloud feature extractionpoint cloud feature aggregation
Charles R Q, Su H, Kaichun M and Guibas L J. 2017. PointNet: deep learning on point sets for 3D classification and segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: #16 [DOI: 10.1109/CVPR.2017.16http://dx.doi.org/10.1109/CVPR.2017.16]
Chen X Z, Ma H M, Wan J, Li B and Xia T. 2017. Multi-view 3D object detection network for autonomous driving//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: #691 [DOI: 10.1109/CVPR.2017.691http://dx.doi.org/10.1109/CVPR.2017.691]
Deng J J, Shi S S, Li P W, Zhou W G, Zhang Y Y and Li H Q. 2021. Voxel R-CNN: towards high performance voxel-based 3D object detection//Proceedings of 2021 AAAI Conference on Artificial Intelligence, 35(2): 1201-1209 [DOI: 10.1609/aaai.v35i2.16207http://dx.doi.org/10.1609/aaai.v35i2.16207]
Du L, Ye X Q, Tan X, Feng J F, Xu Z B, Ding E R and Wen S L. 2020. Associate-3Ddet: perceptual-to-conceptual association for 3D point cloud object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: #1334 [DOI: 10.1109/CVPR42600.2020.01334].
Duan K W, Bai S, Xie L X, Qi H G, Huang Q M and Tian Q. 2019. CenterNet: keypoint triplets for object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: #667 [DOI: 10.1109/ICCV.2019.00667http://dx.doi.org/10.1109/ICCV.2019.00667]
He C H, Zeng H, Huang J Q, Hua X S and Zhang L. 2020. Structure aware single-stage 3D object detection from point cloud//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: #1189 [DOI: 10.1109/CVPR42600.2020.01189http://dx.doi.org/10.1109/CVPR42600.2020.01189]
He K M, Gkioxari G, Dollr P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: #322 [DOI: 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322]
Jin S, Li X P, Yang F and Zhang W G. 2023. 3D object detection in road scenes by pseudo-LiDAR point cloud augmentation. Journal of Image and Graphics, 28(11): 3520-3535
晋帅, 李煊鹏, 杨凤, 张为公. 2023. 伪激光点云增强的道路场景三维目标检测. 中国图象图形学报, 28(11): 3520-3535 [DOI: 10.11834/jig.220986http://dx.doi.org/10.11834/jig.220986]
Kim Y, Kim S, Choi J W and Kun D. 2023. CRAFT: camera-radar 3D object detection with spatio-contextual fusion transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1): 1160-1168 [DOI: 10.1609/aaai.v37i1.25198http://dx.doi.org/10.1609/aaai.v37i1.25198]
Lang A H, Vora S, Caesar H, Zhou L B, Yang J and Beijbom O. 2019. PointPillars: fast encoders for object detection from point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: #1298 [DOI: 10.1109/CVPR.2019.01298http://dx.doi.org/10.1109/CVPR.2019.01298]
Li P L, Chen X Z and Shen S J. 2019. Stereo R-CNN based 3D object detection for autonomous driving//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: #783 [DOI: 10.1109/CVPR.2019.00783http://dx.doi.org/10.1109/CVPR.2019.00783]
Li P X, Su S and Zhao H C. 2021. RTS3D: real-time stereo 3D detection from 4D feature-consistency embedding space for autonomous driving. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3): 1930-1939 [DOI: 10.1609/aaai.v35i3.16288http://dx.doi.org/10.1609/aaai.v35i3.16288]
Liang M, Yang B, Chen Y, Hu R and Urtasun R. 2019. Multi-task multi-sensor fusion for 3D object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: #752 [DOI: 10.1109/CVPR.2019.00752http://dx.doi.org/10.1109/CVPR.2019.00752]
Lin T Y, Goyal P, Girshick R, He K M and Dollr P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: #324 [DOI: 10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]
Liu Y X, Wang L and Liu M. 2021. Yolostereo3D: a step back to 2D for efficient stereo 3D detection//Proceedings of 2021 IEEE International Conference on Robotics and Automation (ICRA). Xi’an, China: IEEE: #9561423 [DOI: 10.1109/ICRA48506.2021.9561423http://dx.doi.org/10.1109/ICRA48506.2021.9561423]
Lyu P F, Wang B Q, Cheng F and Xue J L. 2023. Multi-objective association detection of farmland obstacles based on information fusion of millimeter wave radar and camera. Sensors, 23(1): #230 [DOI: 10.3390/s23010230http://dx.doi.org/10.3390/s23010230]
Peng W L, Pan H, Liu H and Sun Y. 2020. IDA-3D: instance-depth-aware 3D object detection from stereo vision for autonomous driving//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: #1303 [DOI: 10.1109/CVPR42600.2020.01303http://dx.doi.org/10.1109/CVPR42600.2020.01303]
Qi C R, Liu W, Wu C X, Su H and Guibas L J. 2018. Frustum PointNets for 3D object detection from RGB-D data//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: #102 [DOI: 10.1109/CVPR.2018.00102http://dx.doi.org/10.1109/CVPR.2018.00102]
Qi C R, Yi L, Su H and Guibas L J. 2017. PointNet++: deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, 2017, 30: #12413 [DOI: 10.48550/arXiv.1706.02413http://dx.doi.org/10.48550/arXiv.1706.02413]
Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: #690 [DOI: 10.1109/CVPR.2017.690http://dx.doi.org/10.1109/CVPR.2017.690]
Shi S S, Guo C X, Jiang L, Wang Z, Shi J P, Wang X G and Li H S. 2020. PV-RCNN: point-voxel feature set abstraction for 3D Object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: #1054 [DOI: 10.1109/CVPR42600.2020.01054http://dx.doi.org/10.1109/CVPR42600.2020.01054]
Shi S S, Jiang L, Deng J J, Wang Z, Guo C X, Shi J P, Wang X G and Li H S. 2023. PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection. International Journal of Computer Vision, 131(2): 531-551 [DOI: 10.1007/s11263-022-01710-9http://dx.doi.org/10.1007/s11263-022-01710-9]
Shi S S, Wang X G and Li H S. 2019. PointRCNN: 3D object proposal generation and detection from point cloud//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: #86 [DOI: 10.1109/CVPR.2019.00086http://dx.doi.org/10.1109/CVPR.2019.00086]
Shi S S, Wang Z, Shi J P, Wang X G and Li H S. 2021. From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(8): 2647-2664 [DOI: 10.1109/TPAMI.2020.2977026http://dx.doi.org/10.1109/TPAMI.2020.2977026]
Simon M, Milz S, Amende K and Gross H M. 2019. Complex-YOLO: an Euler-region-proposal for real-time 3D object detection on point clouds//Proceedings of the Computer Vision — ECCV 2018 Workshops. Munich, Germany: Springer: 197-209 [DOI: 10.1007/978-3-030-11009-3_11http://dx.doi.org/10.1007/978-3-030-11009-3_11]
Sun J M, Chen L H, Xie Y M, Zhang S Y, Jiang Q H, Zhou X W and Bao H J. 2020. Disp R-CNN: stereo 3D object detection via shape prior guided instance disparity estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognitio (CVPR). Seattle, USA: IEEE: #1056 [DOI: 10.1109/CVPR42600.2020.01056http://dx.doi.org/10.1109/CVPR42600.2020.01056]
Thomas H, Qi C R, Deschaud J E, Marcotegui B, Goulette F and Guibas L. 2019. KPConv: flexible and deformable convolution for point clouds//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: #651 [DOI: 10.1109/ICCV.2019.00651http://dx.doi.org/10.1109/ICCV.2019.00651]
Wan X J, Zhou Y Y, Shen M F, Zhou T and Hu F Y. 2023. Multi-scale context information fusion for instance segmentation. Journal of Image and Graphics, 28(2): 495-509
万新军, 周逸云, 沈鸣飞, 周涛, 胡伏原. 2023. 融合多尺度上下文信息的实例分割. 中国图象图形学报, 28(2): 495-509 [DOI: 10.11834/jig.211090http://dx.doi.org/10.11834/jig.211090]
Wei Z Q, Zhang F K, Chang S, Liu Y Y, Wu H C and Feng Z Y. 2022. MmWave radar and vision fusion for object detection in autonomous driving: a review. Sensors, 22(7): #2542 [DOI: 10.3390/s22072542http://dx.doi.org/10.3390/s22072542]
Wu H, Wen C L, Li W, Li X, Yang R G and Wang C. 2022a. Transformation-equivariant 3D object detection for autonomous driving. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2): 2795-2802 [DOI: 10.1609/aaai.v37i3.25380http://dx.doi.org/10.1609/aaai.v37i3.25380]
Wu X P, Peng L, Yang H H, Xie L, Huang C X, Deng C Q, Liu H F and Cai D. 2022b. Sparse fuse dense: towards high quality 3D detection with depth completion//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: #534 [DOI: 10.1109/CVPR52688.2022.00534http://dx.doi.org/10.1109/CVPR52688.2022.00534]
Xu Q G, Sun X D, Wu C Y, Wang P Q and Neumann U. 2020. Grid-GCN for fast and scalable point cloud learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: #570 [DOI: 10.1109/CVPR42600.2020.00570http://dx.doi.org/10.1109/CVPR42600.2020.00570]
Yan M, Wang J Z and Li J. 2022. Reliable binocular disparity estimation based on multi-scale similarity recursive search. Journal of Image and Graphics, 27(2): 447-460
晏敏, 王军政, 李静. 2022. 多尺度相似性迭代查找的可靠双目视差估计. 中国图象图形学报, 27(2): 447-460 [DOI: 10.11834/jig.210551http://dx.doi.org/10.11834/jig.210551]
Yan Y, Mao Y X and Li B. 2018. SECOND: sparsely embedded convolutional detection. Sensors, 18(10): #3337 [DOI: 10.3390/s18103337http://dx.doi.org/10.3390/s18103337]
Yang Z T, Sun Y N, Liu S and Jia J Y. 2020. 3DSSD: point-based 3D single stage object detector//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: #1105 [DOI: 10.1109/CVPR42600.2020.01105].
Yang Z T, Sun Y N, Liu S, Shen X Y and Jia J Y. 2019. STD: sparse-to-dense 3D object detector for point cloud//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE: #204 [DOI: 10.1109/ICCV.2019.00204http://dx.doi.org/10.1109/ICCV.2019.00204]
Yin T W, Zhou X Y and Krähenbühl P. 2021. Center-based 3D object detection and tracking//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: #1161 [DOI: 10.1109/CVPR46437.2021.01161http://dx.doi.org/10.1109/CVPR46437.2021.01161]
Zhang Y F, Hu Q Y, Xu G Q, Ma Y X, Wan J W and Guo Y L. 2022. Not all points are equal: learning highly efficient point-based detectors for 3D LiDAR point clouds//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: #1838 [DOI: 10.1109/CVPR52688.2022.01838http://dx.doi.org/10.1109/CVPR52688.2022.01838]
Zhao N, Chua T S and Lee G H. 2020. SESS: self-ensembling semi-supervised 3D object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: #1109 [DOI: 10.1109/CVPR42600.2020.01109].
Zheng W, Tang W L, Chen S J, Jiang L and Fu C W. 2021a. CIA-SSD: confident IoU-aware single-stage object detector from point cloud. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4): 3555-3562 [DOI: 10.1609/aaai.v35i4.16470http://dx.doi.org/10.1609/aaai.v35i4.16470]
Zheng W, Tang W L, Jiang L and Fu C W. 2021b. SE-SSD: self-ensembling single-stage object detector from point cloud//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: #1426 [DOI: 10.1109/CVPR46437.2021.01426http://dx.doi.org/10.1109/CVPR46437.2021.01426]
Zheng Y, Lin C Y, Liao K, Zhao Y and Xue S. 2021c. LiDAR point cloud segmentation through scene viewpoint offset. Journal of Image and Graphics, 26(10): 2514-2523
郑阳, 林春雨, 廖康, 赵耀, 薛松. 2021c. 场景视点偏移的激光雷达点云分割. 中国图象图形学报, 26(10): 2514-2523 [DOI: 10.11834/jig.200424http://dx.doi.org/10.11834/jig.200424]
Zhou Y and Tuzel O. 2018. VoxelNet: end-to-end learning for point cloud based 3D object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: #472 [DOI: 10.1109/CVPR.2018.00472http://dx.doi.org/10.1109/CVPR.2018.00472]
相关作者
相关机构