车路两端纯视觉鸟瞰图感知研究综述

周松燃; 卢烨昊; 励雪巍; 傅本尊; 王井东; 李玺

doi:10.11834/jig.230387

图像/视频语义分割 | 浏览量 : 0 下载量: 847 CSCD: 0

PDF
导出
分享
收藏
专辑

车路两端纯视觉鸟瞰图感知研究综述
Pure camera-based bird’s-eye-view perception in vehicle side and infrastructure side： a review
2024年29卷第5期页码：1169-1187
收稿日期：2023-06-20，

修回日期：2023-12-26，

纸质出版日期：2024-05-16
DOI： 10.11834/jig.230387
稿件说明：

移动端阅览

周松燃，卢烨昊，励雪巍，傅本尊，王井东，李玺. 2024. 车路两端纯视觉鸟瞰图感知研究综述. 中国图象图形学报， 29(05):1169-1187 DOI： 10.11834/jig.230387.

Zhou Songran， Lu Yehao， Li Xuewei， Fu Benzun， Wang Jingdong， Li Xi. 2024. Pure camera-based bird’s-eye-view perception in vehicle side and infrastructure side： a review. Journal of Image and Graphics， 29(05):1169-1187 DOI： 10.11834/jig.230387.

摘要

纯视觉鸟瞰图（bird’s-eye-view，BEV）感知是国内外自动驾驶领域的前沿方向与研究热点，旨在通过相机2D图像信息，生成3D空间中周围道路环境俯视视角下的特征表示。该领域在单车智能方向上迅速发展，并实现大量落地部署。但由于车端相机的安装高度受限，不可避免地面临着远距离感知不稳定、存在驾驶盲区等实际问题，单车智能仍存在着一定的安全性风险。路端摄像头部署在红绿灯杆等高处基础设施上，能够有效扩展智能车辆的感知范围，补充盲区视野。因此，车路协同逐渐成为当前自动驾驶的发展趋势。据此，本文从相机部署端和相机视角出发，将纯视觉BEV感知技术划分为车端单视角感知、车端环视视角感知和路端固定视角感知三大方向。在每一方向中，从通用处理流程入手梳理其技术发展脉络，针对主流数据集、BEV映射模型和任务推理输出三大模块展开综述。此外，本文还介绍了相机成像系统的基本原理，并对现有方法从骨干网络使用统计、GPU（graphics processing unit）类型使用统计和模型性能统计等角度进行了定量分析，从可视化对比角度进行了定性分析。最后，从场景多元、尺度多样分布等技术挑战和相机几何参数迁移能力差、计算资源受限等部署挑战两方面揭示了当前纯视觉BEV感知技术亟待解决的问题。并从车路协同、车车协同、虚拟现实交互和统一多任务基座大模型4个方向对本领域的发展进行了全面展望。希望通过对纯视觉BEV感知现有研究以及未来趋势的总结为相关领域研究人员提供一个全面的参考以及探索的方向。

Abstract

As a key technology for 3D perception in the autonomous driving domain， pure camera-based bird’s-eye-view （BEV） perception aims to generate a top-down view representation of the surrounding traffic environment using only 2D image information captured by cameras. In recent years， it has gained considerable attention in the computer vision research community. The potential of BEV is immense because it can represent image features from multiple camera viewpoints in a unified space and provide explicit position and size information of the target object. While most BEV methods focus on developing perception methods on ego-vehicle sensors， people have gradually realized the importance of using intelligent roadside cameras to extend the perception ability beyond the visual range in recent years. However， this novel and growing research field has not been reviewed recently. This paper presents a comprehensive review of pure camera-based BEV perception technology based on camera deployment and camera angle， which are segmented into three categories： 1） vehicle-side single-view perception， 2） vehicle-side surround-view perception， and 3） infrastructure-side fixed-view perception. Meanwhile， the typical processing flow， which contains three primary parts： dataset input， BEV model， and task inference output， is introduced. In the task inference output section， four typical tasks in the 3D perception of autonomous driving （i.e.， 3D object detection， 3D lane detection， BEV map segmentation， and high-definition map generation） are described in detail. For supporting convenient retrieval， this study summarizes the supported tasks and official links for various datasets and provides open-source code links for representative BEV models in a table format. Simultaneously， the performance of various BEV models on public datasets is analyzed and compared. To our best knowledge， three types of BEV challenging problems must be resolved： 1） scene uncertainty problems： In an open-road scenario， many scenes never appear in the training dataset. These scenarios can include extreme weather conditions， such as dark nights， strong winds， heavy rain， and thick fog. A model’s reliability must not degrade in these unusual circumstances. However， majority of BEV models tend to suffer from considerable performance degradation when exposed to varying road scenarios. 2） Scale uncertainty problems： autonomous driving perception tasks have many extreme scale targets. For example， in a roadside scenario， placing a camera on a traffic signal or streetlight pole at least 3 m above the ground can help detect farther targets. However， facing the extremely small scale of the distant targets， existing BEV models have serious issues with false and missed detections. 3） Camera parameter sensitivity problems： most existing BEV models depend on precisely calibrated intrinsic and extrinsic camera parameters for their success during training and evaluation. The performance of these methods drastically diminishes if noisy extrinsic camera parameters are utilized or unseen intrinsic camera parameters are inputted. Meanwhile， a comprehensive outlook on the development of pure camera-based BEV perception is given： 1） vehicle-to-infrastructure （V2I） cooperation： V2I cooperation refers to the integration of information from vehicle-side and infrastructure-side to achieve the visual perception tasks of autonomous driving under communication bandwidth constraints. The design and implementation of a vehicle-infrastructure integration perception algorithm can lead to remarkable benefits， such as supplementing blind spots， expanding the field of view， and improving perception accuracy. 2） Vehicle-to-vehicle （V2V） cooperation： V2V cooperation means that connected autonomous vehicles （CAVs） can share the collected data with each other under communication bandwidth constraints. CAVs can collaborate to compensate for the shortage of data and expand view for vehicles in need， thereby augmenting perception capabilities， boosting detection accuracy， and improving driving safety. 3） Multitask learning： the purpose of multitask learning is to optimize multiple tasks at the same time to improve the efficiency and performance of algorithms， simplifying the complexity of models. In BEV models， the generated BEV features are friendly to many downstream tasks， such as 3D object detection and BEV map segmentation. Sharing models can largely increase the parameter sharing rate， save computing costs， reduce training time， and improve model generalization performance. The objective of these endeavors is to provide a comprehensive guide and reference for researchers in related fields by thoroughly summarizing and analyzing existing research and future trends in the field of pure camera-based BEV perception.

关键词

Keywords

references

Ashish V ， Noam S ， Niki P ， Jakob U ， Llion J ， Aidan N G ， Łukasz K and Illia P . 2017 . Attention is all you need // Proceedings of the 31st International Conference on Neural Information Processing Systems December 2017 . Long Beach， USA ： ACM： 6000 - 6010 . ［ DOI： 10.5040/9781350101272.00000005 http://dx.doi.org/10.5040/9781350101272.00000005 ］

Bartoccioni F ， Zablocki É ， Bursuc A ， Pérez P ， Cord M and Alahari K . 2023 . LaRa： latents and rays for multi-camera bird’s-eye-view semantic segmentation // Proceedings of the 6th Conference on Robot Learning . Auckland， New Zealand ： PMLR： 1663 - 1672

Caesar H ， Bankiti V ， Lang A H ， Vora S ， Liong V E ， Xu Q ， Krishnan A ， Pan Y ， Baldan G and Beijbom O . 2020 . nuScenes： a multimodal dataset for autonomous driving // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 11618 - 11628 ［ DOI： 10.1109/CVPR42600.2020.01164 http://dx.doi.org/10.1109/CVPR42600.2020.01164 ］

Cao J L ， Li Y L ， Sun H Q ， Xie J ， Huang K Q and Pang Y W . 2022 . A survey on deep learning based visual object detection . Journal of Image and Graphics ， 27 （ 6 ）： 1697 - 1722

曹家乐，李亚利，孙汉卿，谢今，黄凯奇，庞彦伟 . 2022 . 基于深度学习的视觉目标检测技术综述 . 中国图象图形学报， 27 （ 6 ）： 1697 - 1722 ［ DOI： 10.11834/jig.220069 http://dx.doi.org/10.11834/jig.220069 ］

Carion N ， Massa F ， Synnaeve G ， Usunier N ， Kirillov A and Zagoruyko S . 2020 . End-to-end object detection with Transformers // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 213 - 229 ［ DOI： 10.1007/978-3-030-58452-8_13 http://dx.doi.org/10.1007/978-3-030-58452-8_13 ］

Cordts M ， Omran M ， Ramos S ， Rehfeld T ， Enzweiler M ， Benenson R ， Franke U ， Roth S and Schiele B . 2016 . The cityscapes dataset for semantic urban scene understanding // Proceedings of 2016 IEEE Conference On Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 3213 - 3223 ［ DOI： 10.1109/CVPR.2016.350 http://dx.doi.org/10.1109/CVPR.2016.350 ］

Chen L ， Sima C H ， Li Y ， Zheng Z H ， Xu J J ， Geng X W ， Li H Y ， He C H ， Shi J P ， Qiao Y and Yan J C . 2022 . PersFormer： 3D lane detection via perspective Transformer and the OpenLane benchmark // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 550 - 567 ［ DOI： 10.1007/978-3-031-19839-7_32 http://dx.doi.org/10.1007/978-3-031-19839-7_32 ］

Danila Rukhovich ， Anna Vorontsova and Anton Konushin . 2022 . ImVoxelNet： image to voxels projection for monocular and multi-view general-purpose 3d object detection // Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision （WACV） . Waikoloa， USA ： IEEE： 2397 - 2406 . ［ DOI： 10.1109/WACV51458.2022.00133 http://dx.doi.org/10.1109/WACV51458.2022.00133 ］

Efrat N ， Bluvstein M ， Oron S ， Levi D ， Garnett N and El Shlomo B . 2020 . 3 D-LaneNet+： anchor free lane detection using a semi-local representation ［EB/OL］. ［ 2023-04-17 ］. https://arxiv.org/pdf/2011.01535.pdf https://arxiv.org/pdf/2011.01535.pdf

Fan S Q ， Wang Z ， Huo X L ， Wang Y and Liu J J . 2023 . Calibration-free BEV representation for infrastructure perception // Proceedings of 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems . Detroit， USA ： IEEE： 9008 - 9013 ［ DOI： 10.1109/IROS55552.2023.10341916 http://dx.doi.org/10.1109/IROS55552.2023.10341916 ］

Garnett N ， Cohen R ， Pe’er T ， Lahav R and Levi D . 2019 . 3D-LaneNet： end-to-end 3D multiple lane detection // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 2921 - 2930 ［ DOI： 10.1109/ICCV.2019.00301 http://dx.doi.org/10.1109/ICCV.2019.00301 ］

Geiger A ， Lenz P ， Stiller C and Urtasun R . 2013 . Vision meets robotics： the KITTI dataset . The International Journal of Robotics Research ， 32 （ 11 ）： 1231 - 1237 ［ DOI： 10.1177/0278364913491297 http://dx.doi.org/10.1177/0278364913491297 ］

Gosala N and Valada A . 2022 . Bird’s-eye-view panoptic segmentation using monocular frontal view images . IEEE Robotics and Automation Letters ， 7 （ 2 ）： 1968 - 1975 ［ DOI： 10.1109/LRA.2022.3142418 http://dx.doi.org/10.1109/LRA.2022.3142418 ］

Houston J ， Zuidhof G ， Bergamini L ， Ye Y ， Chen L ， Jain A ， Omari S ， Iglovikov V and Ondruska P . 2020 . One thousand and one hours： self-driving motion prediction dataset ［EB/OL］. ［ 2023-04-17 ］. https://arxiv.org/pdf/2006.14480v2.pdf https://arxiv.org/pdf/2006.14480v2.pdf

He K M ， Zhang X Y ， Ren S Q and Sun J . 2016 . Deep residual learning for image recognition // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 770 - 778 ［ DOI： 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ］

Hu X M ， Li S ， Huang T Y ， Tang B ， Huai R X and Chen L . 2023 . How simulation helps autonomous driving： a survey of sim2real， digital twins， and parallel intelligence ［EB/OL］. ［ 2023-04-17 ］. https://arxiv.org/pdf/2305.01263.pdf https://arxiv.org/pdf/2305.01263.pdf

Huang J J ， Huang G ， Zhu Z ， Ye Y and Du D L . 2022 . BEVDet： high-performance multi-camera 3D object detection in bird-eye-view ［EB/OL］. ［ 2023-04-17 ］. https://arxiv.org/pdf/2112.11790.pdf https://arxiv.org/pdf/2112.11790.pdf

Huang X Y ， Cheng X J ， Geng Q C ， Cao B B ， Zhou D F ， Wang P ， Lin Y Q and Yang R G . 2018 . The ApolloScape dataset for autonomous driving // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Salt Lake City， USA ： IEEE： 954 - 960 ［ DOI： 10.1109/CVPRW.2018.00141 http://dx.doi.org/10.1109/CVPRW.2018.00141 ］

Jia D ， Wang Z T ， Li Y Y ， Jin Z Y ， Liu Z Y and Wu S . 2022 . Multi-stage guidance network for constructing dense depth map based on LiDAR and RGB data . Journal of Image and Graphics ， 27 （ 2 ）： 435 - 446

贾迪，王子滔，李宇扬，金志杨，刘泽洋，吴思 . 2022 . 结合LiDAR与RGB数据构建稠密深度图的多阶段指导网络 . 中国图象图形学报， 27 （ 2 ）： 435 - 446 ［ DOI： 10.11834/jig.210465 http://dx.doi.org/10.11834/jig.210465 ］

Jiang Y Q ， Zhang L ， Miao Z W ， Zhu X T ， Gao J ， Hu W M and Jiang Y G . 2023 . PolarFormer： multi-camera 3D object detection with polar Transformer // Proceedings of the 37th AAAI Conference on Artificial Intelligence . Washington， USA ： AAAI Press： 1042 - 1050 ［ DOI： 10.1609/aaai.v37i1.25185 http://dx.doi.org/10.1609/aaai.v37i1.25185 ］

Lang A H ， Vora S ， Caesar H ， Zhou L B ， Yang J and Beijbom O . 2019 . PointPillars： fast encoders for object detection from point clouds // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 12689 - 12697 ［ DOI： 10.1109/CVPR.2019.01298 http://dx.doi.org/10.1109/CVPR.2019.01298 ］

Lee Y ， Hwang J W ， Lee S ， Bae Y and Park J . 2019 . An energy and GPU-computation efficient backbone network for real-time object detection // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Long Beach， USA ： IEEE： 752 - 760 ［ DOI： 10.1109/CVPRW.2019.00103 http://dx.doi.org/10.1109/CVPRW.2019.00103 ］

Li H Y ， Sima C H ， Dai J F ， Wang W H ， Lu L W ， Wang H J ， Zeng J ， Li Z Q ， Yang J Z ， Deng H M ， Tian H ， Xie E Z ， Xie J W ， Chen L ， Li T Y ， Li Y ， Gao Y L ， Jia X S ， Liu S ， Shi J P ， Lin D H and Qiao Y . 2022a . Delving into the devils of bird’s-eye-view perception： a review， evaluation and recipe ［EB/OL］. ［ 2023-04-17 ］. https://arxiv.org/pdf/2209.05324.pdf https://arxiv.org/pdf/2209.05324.pdf

Li Q ， Wang Y ， Wang Y L and Zhao H . 2022b . HDMapNet： an online HD map construction and evaluation framework // Proceedings of 2022 International Conference on Robotics and Automation . Philadelphia， USA ： IEEE： 4628 - 4634 ［ DOI： 10.1109/ICRA46639.2022.9812383 http://dx.doi.org/10.1109/ICRA46639.2022.9812383 ］

Li X Y ， Ye Z H ， Wei S K ， Chen Z ， Chen X T ， Tian Y H ， Dang J W ， Fu S J and Zhao Y . 2023 . 3D object detection for autonomous driving from image： a survey—benchmarks， constraints and error analysis . Journal of Image and Graphics ， 28 （ 6 ）： 1709 - 1740

李熙莹，叶芝桧，韦世奎，陈泽，陈小彤，田永鸿，党建武，付树军，赵耀 . 2023 . 基于图像的自动驾驶3D目标检测综述——基准、制约因素和误差分析 . 中国图象图形学报， 28 （ 6 ）： 1709 - 1740 ［ DOI： 10.11834/jig.230036 http://dx.doi.org/10.11834/jig.230036 ］

Li Y ， Huang B ， Chen Z ， Cui Y ， Liang F ， Shen M ， Liu F ， Xie E ， Sheng L ， Ouyang W ， Shao J ， 2023 . Fast-BEV： a fast and strong bird’s-eye view perception baseline ［EB/OL］. ［ 2023-04-17 ］. https://arxiv.org/pdf/2301.12511.pdf https://arxiv.org/pdf/2301.12511.pdf

Li Y G ， Huang B ， Chen Z R ， Cui Y F ， Liang F ， Shen M Z ， Liu F G ， Xie E Z ， Sheng L ， Ouyang W L and Shao J . 2023a . Fast-BEV： a fast and strong bird’s-eye view perception baseline ［EB/OL］. ［ 2023-04-17 ］. https://arxiv.org/pdf/2301.12511.pdf https://arxiv.org/pdf/2301.12511.pdf

Li Y H ， Ge Z ， Yu G Y ， Yang J R ， Wang Z R ， Shi Y K ， Sun J J and Li Z M . 2023b . BEVDepth： acquisition of reliable depth for multi-view 3D object detection // Proceedings of the 37th AAAI Conference on Artificial Intelligence . Washington， USA ： AAAI Press： 1477 - 1485 ［ DOI： 10.1609/aaai.v37i2.25233 http://dx.doi.org/10.1609/aaai.v37i2.25233 ］

Li Z Q ， Wang W H ， Li H Y ， Xie E Z ， Sima C H ， Lu T ， Qiao Y and Dai J F . 2022c . BEVFormer： learning bird’s-eye-view representation from multi-camera images via spatiotemporal Transformers // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 1 - 18 ［ DOI： 10.1007/978-3-031-20077-9_1 http://dx.doi.org/10.1007/978-3-031-20077-9_1 ］

Lin T Y ， Doll􀅡r P ， Girshick R ， He K M ， Hariharan B and Belongie S . 2017 . Feature pyramid networks for object detection // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 936 - 944 ［ DOI： 10.1109/CVPR.2017.106 http://dx.doi.org/10.1109/CVPR.2017.106 ］

Liu W ， Anguelov D ， Erhan D ， Szegedy C ， Reed S ， Fu C Y and Berg A C . 2016 . SSD： single shot MultiBox detector // Proceedings of the 14th European Conference on Computer Vision . Amsterdam， the Netherlands ： Springer： 21 - 37 ［ DOI： 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ］

Liu Y C ， Yuan T Y ， Wang Y ， Wang Y L and Zhao H . 2023 . VectorMapNet： end-to-end vectorized HD map learning // Proceedings of the 40th International Conference on Machine Learning . Honolulu， USA ： PMLR： 22352 - 22369

Liu Z ， Lin Y T ， Cao Y ， Hu H ， Wei Y X ， Zhang Z ， Lin S and Guo B N . 2021 . Swin Transformer： hierarchical vision Transformer using shifted windows // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 10012 - 10022 ［ DOI： 10.1109/ICCV48922.2021.00986 http://dx.doi.org/10.1109/ICCV48922.2021.00986 ］

Liu Z ， Mao H Z ， Wu C Y ， Feichtenhofer C ， Darrell T and Xie S N . 2022 . A ConvNet for the 2020s // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 11976 - 11986 ［ DOI： 10.1109/CVPR52688.2022.01167 http://dx.doi.org/10.1109/CVPR52688.2022.01167 ］

Lu J C ， Zhou Z Y ， Zhu X T ， Xu H and Zhang L . 2022 . Learning ego 3D representation as ray tracing // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 129 - 144 ［ DOI： 10.1007/978-3-031-19809-0_8 http://dx.doi.org/10.1007/978-3-031-19809-0_8 ］

Luo H L and Zhou Y F . 2022 . Review of monocular depth estimation based on deep learning . Journal of Image and Graphics ， 27 （ 2 ）： 390 - 403

罗会兰，周逸风 . 2022 . 深度学习单目深度估计研究进展 . 中国图象图形学报， 27 （ 2 ）： 390 - 403 ［ DOI： 10.11834/jig.200618 http://dx.doi.org/10.11834/jig.200618 ］

Ma Y X ， Wang T ， Bai X Y ， Yang H T ， Hou Y N ， Wang Y M ， Qiao Y ， Yang R G ， Manocha D and Zhu X G . 2022 . Vision-centric BEV perception： a survey ［EB/OL］. ［ 2023-04-17 ］. https://arxiv.org/pdf/2208.02797.pdf https://arxiv.org/pdf/2208.02797.pdf

Mallot H A ， Bülthoff H H ， Little J J and Bohrer S . 1991 . Inverse perspective mapping simplifies optical flow computation and obstacle detection . Biological Cybernetics ， 64 （ 3 ）： 177 - 185 ［ DOI： 10.1007/BF00201978 http://dx.doi.org/10.1007/BF00201978 ］

Mohan R and Valada A . 2021 . EfficientPS： efficient panoptic segmentation . International Journal of Computer Vision ， 129 （ 5 ）： 1551 - 1579 ［ DOI： 10.1007/s11263-021-01445-z http://dx.doi.org/10.1007/s11263-021-01445-z ］

Ng M H ， Radia K ， Chen J F ， Wang D Q ， Gog I and Gonzalez J E . 2020 . BEV-Seg： bird’s eye view semantic segmentation using geometry and semantic point cloud ［EB/OL］. ［ 2023-04-17 ］. https://arxiv.org/pdf/2006.11436.pdf https://arxiv.org/pdf/2006.11436.pdf

Pan B W ， Sun J K ， Leung H Y T ， Andonian A and Zhou B L . 2020 . Cross-view semantic segmentation for sensing surroundings . IEEE Robotics and Automation Letters ， 5 （ 3 ）： 4867 - 4873 ［ DOI： 10.1109/LRA.2020.3004325 http://dx.doi.org/10.1109/LRA.2020.3004325 ］

Peng L ， Chen Z R ， Fu Z J ， Liang P P and Cheng E K . 2023 . BEVSegFormer： bird’s eye view semantic segmentation from arbitrary camera rigs // Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa， USA ： IEEE： 5924 - 5932 ［ DOI： 10.1109/WACV56688.2023.00588 http://dx.doi.org/10.1109/WACV56688.2023.00588 ］

Philion J and Fidler S . 2020 . Lift， splat， shoot： encoding images from arbitrary camera rigs by implicitly unprojecting to 3D // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 194 - 210 ［ DOI： 10.1007/978-3-030-58568-6_12 http://dx.doi.org/10.1007/978-3-030-58568-6_12 ］

Qin Z Q ， Chen J Y ， Chen C ， Chen X Z and Li X . 2023 . UniFusion： unified multi-view fusion Transformer for spatial-temporal representation in bird’s-eye-view // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： #798 ［ DOI： 10.1109/ICCV51070.2023.00798 http://dx.doi.org/10.1109/ICCV51070.2023.00798 ］

Reading C ， Harakeh A ， Chae J and Waslander S L . 2021 . Categorical depth distribution network for monocular 3D object detection // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 8551 - 8560 ［ DOI： 10.1109/CVPR46437.2021.00845 http://dx.doi.org/10.1109/CVPR46437.2021.00845 ］

Roddick T and Cipolla R . 2020 . Predicting semantic map representations from images using pyramid occupancy networks // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 11135 - 11144 ［ DOI： 10.1109/CVPR42600.2020.01115 http://dx.doi.org/10.1109/CVPR42600.2020.01115 ］

Roddick T ， Kendall A and Cipolla R . 2019 . Orthographic feature transform for monocular 3D object detection // Proceedings of the 30th British Machine Vision Conference 2019 . Cardiff， UK ： BMVA Press： #285

Romera E ， Álvarez J M ， Bergasa L M and Arroyo R . 2018 . ERFNet： efficient residual factorized ConvNet for real-time semantic segmentation . IEEE Transactions on Intelligent Transportation Systems ， 19 （ 1 ）： 263 - 272 ［ DOI： 10.1109/tits.2017.2750080 http://dx.doi.org/10.1109/tits.2017.2750080 ］

Rong G D ， Shin B H ， Tabatabaee H ， Lu Q ， Lemke S ， Možeiko M ， Boise E ， Uhm G ， Gerow M ， Mehta S ， Agafonov E ， Kim T H ， Sterner E ， Ushiroda K ， Reyes M ， Zelenkovsky D and Kim S . 2020 . LGSVL simulator： a high fidelity simulator for autonomous driving // Proceedings of the 23rd IEEE International Conference on Intelligent Transportation Systems . Rhodes， Greece ： IEEE： 1 - 6 ［ DOI： 10.1109/ITSC45102.2020.9294422 http://dx.doi.org/10.1109/ITSC45102.2020.9294422 ］

Ronneberger O ， Fischer P and Brox T . 2015 . U-Net： convolutional networks for biomedical image segmentation // Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention . Munich， Germany ： Springer： 2015 ： 234 - 241 ［ DOI： 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ］

Saha A ， Mendez O ， Russell C and Bowden R . 2022 . Translating images into maps // Proceedings of 2022 International Conference on Robotics and Automation . Philadelphia， USA ： IEEE： 9200 - 9206 ［ DOI： 10.1109/ICRA46639.2022.9811901 http://dx.doi.org/10.1109/ICRA46639.2022.9811901 ］

Sengupta S ， Sturgess P ， Ladický L and Torr P H S . 2012 . Automatic dense visual semantic mapping from street-level imagery // Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems . Vilamoura-Algarve， Portugal ： IEEE： 857 - 862 ［ DOI： 10.1109/IROS.2012.6385958 http://dx.doi.org/10.1109/IROS.2012.6385958 ］

Sun P ， Kretzschmar H ， Dotiwalla X ， Chouard A ， Patnaik V ， Tsui P ， Guo J ， Zhou Y ， Chai Y ， Caine B ， Vasudevan V ， Han W ， Ngiam J ， Zhao H ， Timofeev A ， Ettinger S ， Krivokon M ， Gao A ， Joshi A ， Zhang Y ， Shlens J ， Chen Z and Anguelov D . 2020 . Scalability in perception for autonomous driving： waymo open dataset // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 2443 - 2451 ［ DOI： 10.1109/CVPR42600.2020.00252 http://dx.doi.org/10.1109/CVPR42600.2020.00252 ］

Shin J ， Rameau F ， Jeong H and Kum D . 2023 . InstaGraM： instance-level graph modeling for vectorized HD map learning ［EB/OL］. ［ 2023-04-17 ］. https://arxiv.org/pdf/2301.04470.pdf https://arxiv.org/pdf/2301.04470.pdf

Tan M X and Le Q V . 2019 . EfficientNet： rethinking model scaling for convolutional neural networks // Proceedings of the 36th International Conference on Machine Learning . Long Beach， USA ： PMLR： 6105 - 6114

Wang R H ， Qin J ， Li K Y ， Li Y C ， Cao D and Xu J T . 2023 . BEV-LaneDet： a simple and effective 3D lane detection baseline ［EB/OL］. ［ 2023-04-17 ］. http://arxiv.org/pdf/2210.06006.pdf http://arxiv.org/pdf/2210.06006.pdf

Wang T ， Lian Q ， Zhu C M ， Zhu X G and Zhang W W . 2022 . MV-FCOS3D++： multi-view camera-only 4D object detection with pretrained monocular backbones ［EB/OL］. ［ 2023-04-17 ］. http://arxiv.org/pdf/2207.12716.pdf http://arxiv.org/pdf/2207.12716.pdf

Wang Y ， Chao W L ， Garg D ， Hariharan B ， Campbell M and Weinberger K Q . 2019 . Pseudo-LiDAR from visual depth estimation： bridging the gap in 3D object detection for autonomous driving // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， USA ： IEEE： 8445 - 8453 ［ DOI： 10.1109/CVPR.2019.00864 http://dx.doi.org/10.1109/CVPR.2019.00864 ］

Xie E Z ， Yu Z D ， Zhou D Q ， Philion J ， Anandkumar A ， Fidler S ， Luo P and Alvarez J M . 2022 . M 2 BEV： multi-camera joint 3D detection and segmentation with unified birds-eye view representation ［EB/OL］. ［ 2023-04-17 ］. http://arxiv.org/pdf/2204.05088.pdf http://arxiv.org/pdf/2204.05088.pdf

Xu R S ， Tu Z Z ， Xiang H ， Shao W ， Zhou B L and Ma J Q . 2022 . CoBEVT： cooperative bird’s eye view semantic segmentation with sparse Transformers // Proceedings of the 6th Conference on Robot Learning . Auckland， New Zealand ： PMLR： 989 - 1000

Yang L ， Yu K C ， Tang T ， Li J ， Yuan K ， Wang L ， Zhang X Y and Chen P . 2023 . BEVHeight： a robust framework for vision-based roadside 3D object detection // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， Canada ： IEEE： #2070 ［ DOI： 10.1109/CVPR52729.2023.02070 http://dx.doi.org/10.1109/CVPR52729.2023.02070 ］

Ye X ， Shu M ， Li H ， Shi Y ， Li Y ， Wang G ， Tan X and Ding E . 2022 . Rope3D： The roadside perception dataset for autonomous driving and monocular 3D object detection task // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 21309 - 21318 ［ DOI： 10.1109/CVPR52688.2022.02065 http://dx.doi.org/10.1109/CVPR52688.2022.02065 ］

Yin T W ， Zhou X Y and Krähenbühl P . 2021 . Center-based 3D object detection and tracking // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 11784 - 11793 ［ DOI： 10.1109/CVPR46437.2021.01161 http://dx.doi.org/10.1109/CVPR46437.2021.01161 ］

Yu H B ， Luo Y Z ， Shu M ， Huo Y Y ， Yang Z B ， Shi Y F ， Guo Z L ， Li H Y ， Hu X ， Yuan J R and Nie Z Q . 2022 . DAIR-V2X： a large-scale dataset for vehicle-infrastructure cooperative 3D object detection // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 21329 - 21338 ［ DOI： 10.1109/CVPR52688.2022.02067 http://dx.doi.org/10.1109/CVPR52688.2022.02067 ］

Zhang Y P ， Zhu Z ， Zheng W Z ， Huang J J ， Huang G ， Zhou J and Lu J W . 2022 . BEVerse： unified perception and prediction in birds-eye-view for vision-centric autonomous driving ［EB/OL］. ［ 2023-04-17 ］. http://arxiv.org/pdf/2205.09743.pdf http://arxiv.org/pdf/2205.09743.pdf

Zhou H Y ， Ge Z ， Li Z M and Zhang X Y . 2022a . MatrixVT： efficient multi-camera to BEV transformation for 3D perception ［EB/OL］. ［ 2023-04-17 ］. http://arxiv.org/pdf/2211.10593.pdf http://arxiv.org/pdf/2211.10593.pdf

Zhou H Y ， Ge Z ， Mao W X and Li Z M . 2022b . PersDet： monocular 3D detection in perspective bird’s-eye-view ［EB/OL］. ［ 2023-04-17 ］. http://arxiv.org/pdf/2208.09394.pdf http://arxiv.org/pdf/2208.09394.pdf

文章被引用时，请邮件提醒。

提交

暂无数据