车路两端纯视觉鸟瞰图感知研究综述
Pure camera-based bird’s-eye-view perception in vehicle side and infrastructure side: a review
- 2024年29卷第5期 页码:1169-1187
纸质出版日期: 2024-05-16
DOI: 10.11834/jig.230387
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-05-16 ,
移动端阅览
周松燃, 卢烨昊, 励雪巍, 傅本尊, 王井东, 李玺. 2024. 车路两端纯视觉鸟瞰图感知研究综述. 中国图象图形学报, 29(05):1169-1187
Zhou Songran, Lu Yehao, Li Xuewei, Fu Benzun, Wang Jingdong, Li Xi. 2024. Pure camera-based bird’s-eye-view perception in vehicle side and infrastructure side: a review. Journal of Image and Graphics, 29(05):1169-1187
纯视觉鸟瞰图(bird’s-eye-view,BEV)感知是国内外自动驾驶领域的前沿方向与研究热点,旨在通过相机2D图像信息,生成3D空间中周围道路环境俯视视角下的特征表示。该领域在单车智能方向上迅速发展,并实现大量落地部署。但由于车端相机的安装高度受限,不可避免地面临着远距离感知不稳定、存在驾驶盲区等实际问题,单车智能仍存在着一定的安全性风险。路端摄像头部署在红绿灯杆等高处基础设施上,能够有效扩展智能车辆的感知范围,补充盲区视野。因此,车路协同逐渐成为当前自动驾驶的发展趋势。据此,本文从相机部署端和相机视角出发,将纯视觉BEV感知技术划分为车端单视角感知、车端环视视角感知和路端固定视角感知三大方向。在每一方向中,从通用处理流程入手梳理其技术发展脉络,针对主流数据集、BEV映射模型和任务推理输出三大模块展开综述。此外,本文还介绍了相机成像系统的基本原理,并对现有方法从骨干网络使用统计、GPU(graphics processing unit)类型使用统计和模型性能统计等角度进行了定量分析,从可视化对比角度进行了定性分析。最后,从场景多元、尺度多样分布等技术挑战和相机几何参数迁移能力差、计算资源受限等部署挑战两方面揭示了当前纯视觉BEV感知技术亟待解决的问题。并从车路协同、车车协同、虚拟现实交互和统一多任务基座大模型4个方向对本领域的发展进行了全面展望。希望通过对纯视觉BEV感知现有研究以及未来趋势的总结为相关领域研究人员提供一个全面的参考以及探索的方向。
As a key technology for 3D perception in the autonomous driving domain, pure camera-based bird’s-eye-view (BEV) perception aims to generate a top-down view representation of the surrounding traffic environment using only 2D image information captured by cameras. In recent years, it has gained considerable attention in the computer vision research community. The potential of BEV is immense because it can represent image features from multiple camera viewpoints in a unified space and provide explicit position and size information of the target object. While most BEV methods focus on developing perception methods on ego-vehicle sensors, people have gradually realized the importance of using intelligent roadside cameras to extend the perception ability beyond the visual range in recent years. However, this novel and growing research field has not been reviewed recently. This paper presents a comprehensive review of pure camera-based BEV perception technology based on camera deployment and camera angle, which are segmented into three categories: 1) vehicle-side single-view perception, 2) vehicle-side surround-view perception, and 3) infrastructure-side fixed-view perception. Meanwhile, the typical processing flow, which contains three primary parts: dataset input, BEV model, and task inference output, is introduced. In the task inference output section, four typical tasks in the 3D perception of autonomous driving (i.e., 3D object detection, 3D lane detection, BEV map segmentation, and high-definition map generation) are described in detail. For supporting convenient retrieval, this study summarizes the supported tasks and official links for various datasets and provides open-source code links for representative BEV models in a table format. Simultaneously, the performance of various BEV models on public datasets is analyzed and compared. To our best knowledge, three types of BEV challenging problems must be resolved: 1) scene uncertainty problems: In an open-road scenario, many scenes never appear in the training dataset. These scenarios can include extreme weather conditions, such as dark nights, strong winds, heavy rain, and thick fog. A model’s reliability must not degrade in these unusual circumstances. However, majority of BEV models tend to suffer from considerable performance degradation when exposed to varying road scenarios. 2) Scale uncertainty problems: autonomous driving perception tasks have many extreme scale targets. For example, in a roadside scenario, placing a camera on a traffic signal or streetlight pole at least 3 m above the ground can help detect farther targets. However, facing the extremely small scale of the distant targets, existing BEV models have serious issues with false and missed detections. 3) Camera parameter sensitivity problems: most existing BEV models depend on precisely calibrated intrinsic and extrinsic camera parameters for their success during training and evaluation. The performance of these methods drastically diminishes if noisy extrinsic camera parameters are utilized or unseen intrinsic camera parameters are inputted. Meanwhile, a comprehensive outlook on the development of pure camera-based BEV perception is given: 1) vehicle-to-infrastructure (V2I) cooperation: V2I cooperation refers to the integration of information from vehicle-side and infrastructure-side to achieve the visual perception tasks of autonomous driving under communication bandwidth constraints. The design and implementation of a vehicle-infrastructure integration perception algorithm can lead to remarkable benefits, such as supplementing blind spots, expanding the field of view, and improving perception accuracy. 2) Vehicle-to-vehicle (V2V) cooperation: V2V cooperation means that connected autonomous vehicles (CAVs) can share the collected data with each other under communication bandwidth constraints. CAVs can collaborate to compensate for the shortage of data and expand view for vehicles in need, thereby augmenting perception capabilities, boosting detection accuracy, and improving driving safety. 3) Multitask learning: the purpose of multitask learning is to optimize multiple tasks at the same time to improve the efficiency and performance of algorithms, simplifying the complexity of models. In BEV models, the generated BEV features are friendly to many downstream tasks, such as 3D object detection and BEV map segmentation. Sharing models can largely increase the parameter sharing rate, save computing costs, reduce training time, and improve model generalization performance. The objective of these endeavors is to provide a comprehensive guide and reference for researchers in related fields by thoroughly summarizing and analyzing existing research and future trends in the field of pure camera-based BEV perception.
自动驾驶感知纯视觉BEV感知路端固定视角感知车端移动视角感知多视角图像融合
autonomous driving perceptionpure camera-based BEV perceptioninfrastructure-side perceptionvehicle-side perceptionmulti-view image fusion
Ashish V, Noam S, Niki P, Jakob U, Llion J, Aidan N G, Łukasz K and Illia P. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems December 2017. Long Beach, USA: ACM: 6000-6010. [DOI: 10.5040/9781350101272.00000005http://dx.doi.org/10.5040/9781350101272.00000005]
Bartoccioni F, Zablocki É, Bursuc A, Pérez P, Cord M and Alahari K. 2023. LaRa: latents and rays for multi-camera bird’s-eye-view semantic segmentation//Proceedings of the 6th Conference on Robot Learning. Auckland, New Zealand: PMLR: 1663-1672
Caesar H, Bankiti V, Lang A H, Vora S, Liong V E, Xu Q, Krishnan A, Pan Y, Baldan G and Beijbom O. 2020. nuScenes: a multimodal dataset for autonomous driving//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11618-11628 [DOI: 10.1109/CVPR42600.2020.01164http://dx.doi.org/10.1109/CVPR42600.2020.01164]
Cao J L, Li Y L, Sun H Q, Xie J, Huang K Q and Pang Y W. 2022. A survey on deep learning based visual object detection. Journal of Image and Graphics, 27(6): 1697-1722
曹家乐, 李亚利, 孙汉卿, 谢今, 黄凯奇, 庞彦伟. 2022. 基于深度学习的视觉目标检测技术综述. 中国图象图形学报, 27(6): 1697-1722 [DOI: 10.11834/jig.220069http://dx.doi.org/10.11834/jig.220069]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A and Zagoruyko S. 2020. End-to-end object detection with Transformers//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 213-229 [DOI: 10.1007/978-3-030-58452-8_13http://dx.doi.org/10.1007/978-3-030-58452-8_13]
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S and Schiele B. 2016. The cityscapes dataset for semantic urban scene understanding//Proceedings of 2016 IEEE Conference On Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3213-3223 [DOI: 10.1109/CVPR.2016.350http://dx.doi.org/10.1109/CVPR.2016.350]
Chen L, Sima C H, Li Y, Zheng Z H, Xu J J, Geng X W, Li H Y, He C H, Shi J P, Qiao Y and Yan J C. 2022. PersFormer: 3D lane detection via perspective Transformer and the OpenLane benchmark//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 550-567 [DOI: 10.1007/978-3-031-19839-7_32http://dx.doi.org/10.1007/978-3-031-19839-7_32]
Danila Rukhovich, Anna Vorontsova and Anton Konushin. 2022. ImVoxelNet: image to voxels projection for monocular and multi-view general-purpose 3d object detection//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE: 2397-2406. [DOI: 10.1109/WACV51458.2022.00133http://dx.doi.org/10.1109/WACV51458.2022.00133]
Efrat N, Bluvstein M, Oron S, Levi D, Garnett N and El Shlomo B. 2020. 3D-LaneNet+: anchor free lane detection using a semi-local representation [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2011.01535.pdfhttps://arxiv.org/pdf/2011.01535.pdf
Fan S Q, Wang Z, Huo X L, Wang Y and Liu J J. 2023. Calibration-free BEV representation for infrastructure perception//Proceedings of 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems. Detroit, USA: IEEE: 9008-9013 [DOI: 10.1109/IROS55552.2023.10341916http://dx.doi.org/10.1109/IROS55552.2023.10341916]
Garnett N, Cohen R, Pe’er T, Lahav R and Levi D. 2019. 3D-LaneNet: end-to-end 3D multiple lane detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2921-2930 [DOI: 10.1109/ICCV.2019.00301http://dx.doi.org/10.1109/ICCV.2019.00301]
Geiger A, Lenz P, Stiller C and Urtasun R. 2013. Vision meets robotics: the KITTI dataset. The International Journal of Robotics Research, 32(11): 1231-1237 [DOI: 10.1177/0278364913491297http://dx.doi.org/10.1177/0278364913491297]
Gosala N and Valada A. 2022. Bird’s-eye-view panoptic segmentation using monocular frontal view images. IEEE Robotics and Automation Letters, 7(2): 1968-1975 [DOI: 10.1109/LRA.2022.3142418http://dx.doi.org/10.1109/LRA.2022.3142418]
Houston J, Zuidhof G, Bergamini L, Ye Y, Chen L, Jain A, Omari S, Iglovikov V and Ondruska P. 2020. One thousand and one hours: self-driving motion prediction dataset[EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2006.14480v2.pdfhttps://arxiv.org/pdf/2006.14480v2.pdf
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hu X M, Li S, Huang T Y, Tang B, Huai R X and Chen L. 2023. How simulation helps autonomous driving: a survey of sim2real, digital twins, and parallel intelligence [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2305.01263.pdfhttps://arxiv.org/pdf/2305.01263.pdf
Huang J J, Huang G, Zhu Z, Ye Y and Du D L. 2022. BEVDet: high-performance multi-camera 3D object detection in bird-eye-view [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2112.11790.pdfhttps://arxiv.org/pdf/2112.11790.pdf
Huang X Y, Cheng X J, Geng Q C, Cao B B, Zhou D F, Wang P, Lin Y Q and Yang R G. 2018. The ApolloScape dataset for autonomous driving//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 954-960 [DOI: 10.1109/CVPRW.2018.00141http://dx.doi.org/10.1109/CVPRW.2018.00141]
Jia D, Wang Z T, Li Y Y, Jin Z Y, Liu Z Y and Wu S. 2022. Multi-stage guidance network for constructing dense depth map based on LiDAR and RGB data. Journal of Image and Graphics, 27(2): 435-446
贾迪, 王子滔, 李宇扬, 金志杨, 刘泽洋, 吴思. 2022. 结合LiDAR与RGB数据构建稠密深度图的多阶段指导网络. 中国图象图形学报, 27(2): 435-446 [DOI: 10.11834/jig.210465http://dx.doi.org/10.11834/jig.210465]
Jiang Y Q, Zhang L, Miao Z W, Zhu X T, Gao J, Hu W M and Jiang Y G. 2023. PolarFormer: multi-camera 3D object detection with polar Transformer//Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press: 1042-1050 [DOI: 10.1609/aaai.v37i1.25185http://dx.doi.org/10.1609/aaai.v37i1.25185]
Lang A H, Vora S, Caesar H, Zhou L B, Yang J and Beijbom O. 2019. PointPillars: fast encoders for object detection from point clouds//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 12689-12697 [DOI: 10.1109/CVPR.2019.01298http://dx.doi.org/10.1109/CVPR.2019.01298]
Lee Y, Hwang J W, Lee S, Bae Y and Park J. 2019. An energy and GPU-computation efficient backbone network for real-time object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach, USA: IEEE: 752-760 [DOI: 10.1109/CVPRW.2019.00103http://dx.doi.org/10.1109/CVPRW.2019.00103]
Li H Y, Sima C H, Dai J F, Wang W H, Lu L W, Wang H J, Zeng J, Li Z Q, Yang J Z, Deng H M, Tian H, Xie E Z, Xie J W, Chen L, Li T Y, Li Y, Gao Y L, Jia X S, Liu S, Shi J P, Lin D H and Qiao Y. 2022a. Delving into the devils of bird’s-eye-view perception: a review, evaluation and recipe [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2209.05324.pdfhttps://arxiv.org/pdf/2209.05324.pdf
Li Q, Wang Y, Wang Y L and Zhao H. 2022b. HDMapNet: an online HD map construction and evaluation framework//Proceedings of 2022 International Conference on Robotics and Automation. Philadelphia, USA: IEEE: 4628-4634 [DOI: 10.1109/ICRA46639.2022.9812383http://dx.doi.org/10.1109/ICRA46639.2022.9812383]
Li X Y, Ye Z H, Wei S K, Chen Z, Chen X T, Tian Y H, Dang J W, Fu S J and Zhao Y. 2023. 3D object detection for autonomous driving from image: a survey—benchmarks, constraints and error analysis. Journal of Image and Graphics, 28(6): 1709-1740
李熙莹, 叶芝桧, 韦世奎, 陈泽, 陈小彤, 田永鸿, 党建武, 付树军, 赵耀. 2023. 基于图像的自动驾驶3D目标检测综述——基准、制约因素和误差分析. 中国图象图形学报, 28(6): 1709-1740 [DOI: 10.11834/jig.230036http://dx.doi.org/10.11834/jig.230036]
Li Y, Huang B, Chen Z, Cui Y, Liang F, Shen M, Liu F, Xie E, Sheng L, Ouyang W, Shao J, 2023. Fast-BEV: a fast and strong bird’s-eye view perception baseline[EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2301.12511.pdfhttps://arxiv.org/pdf/2301.12511.pdf
Li Y G, Huang B, Chen Z R, Cui Y F, Liang F, Shen M Z, Liu F G, Xie E Z, Sheng L, Ouyang W L and Shao J. 2023a. Fast-BEV: a fast and strong bird’s-eye view perception baseline [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2301.12511.pdfhttps://arxiv.org/pdf/2301.12511.pdf
Li Y H, Ge Z, Yu G Y, Yang J R, Wang Z R, Shi Y K, Sun J J and Li Z M. 2023b. BEVDepth: acquisition of reliable depth for multi-view 3D object detection//Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press: 1477-1485 [DOI: 10.1609/aaai.v37i2.25233http://dx.doi.org/10.1609/aaai.v37i2.25233]
Li Z Q, Wang W H, Li H Y, Xie E Z, Sima C H, Lu T, Qiao Y and Dai J F. 2022c. BEVFormer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal Transformers//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 1-18 [DOI: 10.1007/978-3-031-20077-9_1http://dx.doi.org/10.1007/978-3-031-20077-9_1]
Lin T Y, Dollr P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944 [DOI: 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37 [DOI: 10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]
Liu Y C, Yuan T Y, Wang Y, Wang Y L and Zhao H. 2023. VectorMapNet: end-to-end vectorized HD map learning//Proceedings of the 40th International Conference on Machine Learning. Honolulu, USA: PMLR: 22352-22369
Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B N. 2021. Swin Transformer: hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 10012-10022 [DOI: 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]
Liu Z, Mao H Z, Wu C Y, Feichtenhofer C, Darrell T and Xie S N. 2022. A ConvNet for the 2020s//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 11976-11986 [DOI: 10.1109/CVPR52688.2022.01167http://dx.doi.org/10.1109/CVPR52688.2022.01167]
Lu J C, Zhou Z Y, Zhu X T, Xu H and Zhang L. 2022. Learning ego 3D representation as ray tracing//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 129-144 [DOI: 10.1007/978-3-031-19809-0_8http://dx.doi.org/10.1007/978-3-031-19809-0_8]
Luo H L and Zhou Y F. 2022. Review of monocular depth estimation based on deep learning. Journal of Image and Graphics, 27(2): 390-403
罗会兰, 周逸风. 2022. 深度学习单目深度估计研究进展. 中国图象图形学报, 27(2): 390-403 [DOI: 10.11834/jig.200618http://dx.doi.org/10.11834/jig.200618]
Ma Y X, Wang T, Bai X Y, Yang H T, Hou Y N, Wang Y M, Qiao Y, Yang R G, Manocha D and Zhu X G. 2022. Vision-centric BEV perception: a survey [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2208.02797.pdfhttps://arxiv.org/pdf/2208.02797.pdf
Mallot H A, Bülthoff H H, Little J J and Bohrer S. 1991. Inverse perspective mapping simplifies optical flow computation and obstacle detection. Biological Cybernetics, 64(3): 177-185 [DOI: 10.1007/BF00201978http://dx.doi.org/10.1007/BF00201978]
Mohan R and Valada A. 2021. EfficientPS: efficient panoptic segmentation. International Journal of Computer Vision, 129(5): 1551-1579 [DOI: 10.1007/s11263-021-01445-zhttp://dx.doi.org/10.1007/s11263-021-01445-z]
Ng M H, Radia K, Chen J F, Wang D Q, Gog I and Gonzalez J E. 2020. BEV-Seg: bird’s eye view semantic segmentation using geometry and semantic point cloud [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2006.11436.pdfhttps://arxiv.org/pdf/2006.11436.pdf
Pan B W, Sun J K, Leung H Y T, Andonian A and Zhou B L. 2020. Cross-view semantic segmentation for sensing surroundings. IEEE Robotics and Automation Letters, 5(3): 4867-4873 [DOI: 10.1109/LRA.2020.3004325http://dx.doi.org/10.1109/LRA.2020.3004325]
Peng L, Chen Z R, Fu Z J, Liang P P and Cheng E K. 2023. BEVSegFormer: bird’s eye view semantic segmentation from arbitrary camera rigs//Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 5924-5932 [DOI: 10.1109/WACV56688.2023.00588http://dx.doi.org/10.1109/WACV56688.2023.00588]
Philion J and Fidler S. 2020. Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3D//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 194-210 [DOI: 10.1007/978-3-030-58568-6_12http://dx.doi.org/10.1007/978-3-030-58568-6_12]
Qin Z Q, Chen J Y, Chen C, Chen X Z and Li X. 2023. UniFusion: unified multi-view fusion Transformer for spatial-temporal representation in bird’s-eye-view//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE: #798 [DOI: 10.1109/ICCV51070.2023.00798http://dx.doi.org/10.1109/ICCV51070.2023.00798]
Reading C, Harakeh A, Chae J and Waslander S L. 2021. Categorical depth distribution network for monocular 3D object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 8551-8560 [DOI: 10.1109/CVPR46437.2021.00845http://dx.doi.org/10.1109/CVPR46437.2021.00845]
Roddick T and Cipolla R. 2020. Predicting semantic map representations from images using pyramid occupancy networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11135-11144 [DOI: 10.1109/CVPR42600.2020.01115http://dx.doi.org/10.1109/CVPR42600.2020.01115]
Roddick T, Kendall A and Cipolla R. 2019. Orthographic feature transform for monocular 3D object detection//Proceedings of the 30th British Machine Vision Conference 2019. Cardiff, UK: BMVA Press: #285
Romera E, Álvarez J M, Bergasa L M and Arroyo R. 2018. ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1): 263-272 [DOI: 10.1109/tits.2017.2750080http://dx.doi.org/10.1109/tits.2017.2750080]
Rong G D, Shin B H, Tabatabaee H, Lu Q, Lemke S, Možeiko M, Boise E, Uhm G, Gerow M, Mehta S, Agafonov E, Kim T H, Sterner E, Ushiroda K, Reyes M, Zelenkovsky D and Kim S. 2020. LGSVL simulator: a high fidelity simulator for autonomous driving//Proceedings of the 23rd IEEE International Conference on Intelligent Transportation Systems. Rhodes, Greece: IEEE: 1-6 [DOI: 10.1109/ITSC45102.2020.9294422http://dx.doi.org/10.1109/ITSC45102.2020.9294422]
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 2015: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Saha A, Mendez O, Russell C and Bowden R. 2022. Translating images into maps//Proceedings of 2022 International Conference on Robotics and Automation. Philadelphia, USA: IEEE: 9200-9206 [DOI: 10.1109/ICRA46639.2022.9811901http://dx.doi.org/10.1109/ICRA46639.2022.9811901]
Sengupta S, Sturgess P, Ladický L and Torr P H S. 2012. Automatic dense visual semantic mapping from street-level imagery//Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura-Algarve, Portugal: IEEE: 857-862 [DOI: 10.1109/IROS.2012.6385958http://dx.doi.org/10.1109/IROS.2012.6385958]
Sun P , Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, Zhou Y, Chai Y, Caine B, Vasudevan V, Han W, Ngiam J, Zhao H, Timofeev A, Ettinger S, Krivokon M, Gao A, Joshi A, Zhang Y, Shlens J, Chen Z and Anguelov D. 2020. Scalability in perception for autonomous driving: waymo open dataset//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2443-2451 [DOI: 10.1109/CVPR42600.2020.00252http://dx.doi.org/10.1109/CVPR42600.2020.00252]
Shin J, Rameau F, Jeong H and Kum D. 2023. InstaGraM: instance-level graph modeling for vectorized HD map learning [EB/OL]. [2023-04-17]. https://arxiv.org/pdf/2301.04470.pdfhttps://arxiv.org/pdf/2301.04470.pdf
Tan M X and Le Q V. 2019. EfficientNet: rethinking model scaling for convolutional neural networks//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR: 6105-6114
Wang R H, Qin J, Li K Y, Li Y C, Cao D and Xu J T. 2023. BEV-LaneDet: a simple and effective 3D lane detection baseline [EB/OL]. [2023-04-17]. http://arxiv.org/pdf/2210.06006.pdfhttp://arxiv.org/pdf/2210.06006.pdf
Wang T, Lian Q, Zhu C M, Zhu X G and Zhang W W. 2022. MV-FCOS3D++: multi-view camera-only 4D object detection with pretrained monocular backbones [EB/OL]. [2023-04-17]. http://arxiv.org/pdf/2207.12716.pdfhttp://arxiv.org/pdf/2207.12716.pdf
Wang Y, Chao W L, Garg D, Hariharan B, Campbell M and Weinberger K Q. 2019. Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8445-8453 [DOI: 10.1109/CVPR.2019.00864http://dx.doi.org/10.1109/CVPR.2019.00864]
Xie E Z, Yu Z D, Zhou D Q, Philion J, Anandkumar A, Fidler S, Luo P and Alvarez J M. 2022. M2BEV: multi-camera joint 3D detection and segmentation with unified birds-eye view representation [EB/OL]. [2023-04-17]. http://arxiv.org/pdf/2204.05088.pdfhttp://arxiv.org/pdf/2204.05088.pdf
Xu R S, Tu Z Z, Xiang H, Shao W, Zhou B L and Ma J Q. 2022. CoBEVT: cooperative bird’s eye view semantic segmentation with sparse Transformers//Proceedings of the 6th Conference on Robot Learning. Auckland, New Zealand: PMLR: 989-1000
Yang L, Yu K C, Tang T, Li J, Yuan K, Wang L, Zhang X Y and Chen P. 2023. BEVHeight: a robust framework for vision-based roadside 3D object detection//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: #2070 [DOI: 10.1109/CVPR52729.2023.02070http://dx.doi.org/10.1109/CVPR52729.2023.02070]
Ye X, Shu M, Li H, Shi Y, Li Y, Wang G, Tan X and Ding E. 2022. Rope3D: The roadside perception dataset for autonomous driving and monocular 3D object detection task//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 21309-21318 [DOI: 10.1109/CVPR52688.2022.02065http://dx.doi.org/10.1109/CVPR52688.2022.02065]
Yin T W, Zhou X Y and Krähenbühl P. 2021. Center-based 3D object detection and tracking//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 11784-11793 [DOI: 10.1109/CVPR46437.2021.01161http://dx.doi.org/10.1109/CVPR46437.2021.01161]
Yu H B, Luo Y Z, Shu M, Huo Y Y, Yang Z B, Shi Y F, Guo Z L, Li H Y, Hu X, Yuan J R and Nie Z Q. 2022. DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3D object detection//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 21329-21338 [DOI: 10.1109/CVPR52688.2022.02067http://dx.doi.org/10.1109/CVPR52688.2022.02067]
Zhang Y P, Zhu Z, Zheng W Z, Huang J J, Huang G, Zhou J and Lu J W. 2022. BEVerse: unified perception and prediction in birds-eye-view for vision-centric autonomous driving [EB/OL]. [2023-04-17]. http://arxiv.org/pdf/2205.09743.pdfhttp://arxiv.org/pdf/2205.09743.pdf
Zhou H Y, Ge Z, Li Z M and Zhang X Y. 2022a. MatrixVT: efficient multi-camera to BEV transformation for 3D perception [EB/OL]. [2023-04-17]. http://arxiv.org/pdf/2211.10593.pdfhttp://arxiv.org/pdf/2211.10593.pdf
Zhou H Y, Ge Z, Mao W X and Li Z M. 2022b. PersDet: monocular 3D detection in perspective bird’s-eye-view [EB/OL]. [2023-04-17]. http://arxiv.org/pdf/2208.09394.pdfhttp://arxiv.org/pdf/2208.09394.pdf
相关文章
相关作者
相关机构