端到端自动驾驶系统研究综述

陈妍妍; 田大新; 林椿眄; 殷鸿博

doi:10.11834/jig.230787

无人系统的平行决策智能 | 浏览量 : 0 下载量: 2007 CSCD: 0

PDF
导出
分享
收藏
专辑

端到端自动驾驶系统研究综述
Survey of end-to-end autonomous driving systems
2024年29卷第11期页码：3216-3237
收稿日期：2023-11-07，

修回日期：2024-01-06，

纸质出版日期：2024-11-16
DOI： 10.11834/jig.230787
稿件说明：

移动端阅览

陈妍妍，田大新，林椿眄，殷鸿博. 2024. 端到端自动驾驶系统研究综述. 中国图象图形学报， 29(11):3216-3237 DOI： 10.11834/jig.230787.

Chen Yanyan， Tian Daxin， Lin Chunmian， Yin Hongbo. 2024. Survey of end-to-end autonomous driving systems. Journal of Image and Graphics， 29(11):3216-3237 DOI： 10.11834/jig.230787.

摘要

近年深度学习技术助力端到端自动驾驶框架的发展和进步，涌现出一系列创新研究议题与应用部署方案。本文首先以经典的模块化系统切入，对自动驾驶感知—预测—规划—决策4大功能模块进行简要概述，分析传统的模块化和多任务方法的局限性；其次从输入—输出模态到系统架构角度对当前新兴的端到端自动驾驶框架进行广泛地调研，详细描述弱解释性端到端与模块化联合端到端两大主流范式，深入探究现有研究工作存在的不足和弊端；之后简单介绍了端到端自动驾驶系统的开环—闭环评估方法及适用场景；最后总结了端到端自动驾驶系统的研究工作，并从数据挖掘和架构设计角度展望领域潜在挑战和亟待解决的关键问题。

Abstract

Deep learning technologies have accelerated the development and advancement of end-to-end autonomous driving frameworks in recent years， sparking the emergence of numerous cutting-edge research topics and application deployment solutions. The “divide and conquer” architecture design concept， which aims to construct multiple independent but related module components， integrate them into the developed software system in a specific semantic or geometric order， and ultimately deploy these components to the actual vehicle， is the foundation for the majority of the autonomous driving systems currently in use， also known as modular systems. However， a well-developed modular design typically comprises thousands of components， placing a considerable burden on the graphics memory and processing capacity of automotive CPUs. Furthermore， the intrinsic mistakes of each stacked module during prediction will rise with the number of stacked modules， and upstream flaws cannot be fixed in downstream modules， presenting a major risk to vehicle safety. A multitask architecture based on the “task parallelism” principle aims to efficiently infer multiple tasks in parallel by designing various decoded heads with a shared backbone network to reduce computational consumption. However， the optimization goals for various tasks may not be consistent， and sharing features mindlessly can even degrade the overall performance of the system. In contrast to the previous two system architectures， the end-to-end technology paradigm eliminates information bottlenecks and cumulative errors due to the integration of numerous intermediate components based on rule interfaces， allowing the network to continually optimize toward a unified objective. A large model can be used to generate low-level control signals or vehicle motion planning based on inputs such as sensor data and vehicle status. With sensors serving as inputs， the early end-to-end design based on imitation and reinforcement learning directly outputs the final control commands for steering， braking， and acceleration. However， no explicit representation of driving scenarios in this completely “black box” network， which is also referred to as weakly interpretable end-to-end methods， is available. Thus， understanding the reasoning behind the decision or prediction of a vehicle is difficult for humans， making debugging， validation， and optimization challenging. Even worse， once the model malfunctions or unexpected situations occur， accurately detecting， avoiding， and repairing problems in a timely manner becomes difficult， all of which are crucial for maintaining the safe operation of intelligent vehicles. The component decoupling approach facilitates the development and optimization of individual modules in the conventional modular system， thereby guaranteeing steady representation performance and strong interpretability for each submodule. Unfortunately， this method falls short of achieving unified goals at the optimization level， that is， integrating optimization and learning toward the ultimate planning goal. A modular joint end-to-end autonomous driving architecture， which preserves the modular driving system while allowing the differentiability of each module， is a workable solution to ensure that every module has sufficient interpretability and overall automatic optimization capabilities. The basic idea behind this technology lies in the creation of a unique neural network that connects all independent modules and enables the gradients from the planning modules to be fed back down to the initial sensor input for end-to-end execution. In other words， this kind of approach merely modifies the submodule connection mechanism while maintaining the classic modular technology stack； that is， this approach substitutes a new implicit interface for the previous explicit interfaces， which were rule-based and required manual creation. Modular joint end-to-end procedures offer a certain interpretability because of the distinct separation between modules. The explicit end-to-end system is a relative decoupling based on overall design and exhibits some degree of logic in its sequential functioning from perception to prediction， and then to planning modules during decision inference. The model can be intentionally adjusted when it encounters unknown and uncontrollable results by understanding the operational logic underlying the explicit solution. Furthermore， visualization methods， such as internal features or intermediate results of specific tasks or modules， can be utilized to analyze the decision-making operation mechanism， which can prevent potential risks caused by black box models and ensure the safe and efficient driving of intelligent vehicles. Therefore， this article conducts comprehensive analysis and research on the emerging field of end-to-end autonomous driving with promising development prospects， which summarizes the main technical routes and representative research methods around the development path of end-to-end driving systems. More specifically， this article， which begins with the classic modular system， analyzes the shortcomings of conventional modular and multitasking approaches while providing a brief introduction to the four functional modules of the autonomous driving system. These modules primarily include perception， prediction， planning， and decision making. Subsequently， extensive research on the emerging end-to-end autonomous driving frameworks is conducted from the perspective of input-output modality to system architecture， describing in detail the two dominant paradigms and delving into the shortcomings and drawbacks of existing research work. The existing end-to-end architecture can be categorized into two categories based on interpretable performance： weakly interpretable end-to-end， which is explored from the aspects of imitation learning， reinforcement learning， and interpretability； or modular joint end-to-end， which is progressively investigated from bird’s-eye view representation， to joint perception prediction， and ultimately， planning-oriented end-to-end methods. Afterward， a thorough discussion of the end-to-end driving system assessment is provided for closed- and open-loop evaluations， along with the corresponding situations. Finally， the research works on end-to-end autonomous driving systems are summarized， and the potential challenges and key problems that still need to be addressed are discussed from the perspectives of data mining and architecture design.

关键词

Keywords

references

Bojarski M ， Del Testa D ， Dworakowski D ， Firner B ， Flepp B ， Goyal P ， Jackel L D ， Monfort M ， Muller U ， Zhang J K ， Zhang X ， Zhao J and Zieba K . 2016 . End to end learning for self-driving cars ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/1604.07316.pdf https://arxiv.org/pdf/1604.07316.pdf

Bojarski M ， Yeres P ， Choromanska A ， Choromanski K ， Firner B ， Jackel L and Muller U . 2017 . Explaining how a deep neural network trained with end-to-end learning steers a car ［ EB/OL］. ［ 2024-08-16 ］. https://arxiv.org/pdf/1704.07911.pdf https://arxiv.org/pdf/1704.07911.pdf

Caesar H ， Bankiti V ， Lang A H ， Vora S ， Liong V E ， Xu Q ， Krishnan A ， Pan Y ， Baldan G and Beijbom O . 2020 . nuScenes： a multimodal dataset for autonomous driving // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 11618 - 11628 ［ DOI： 10.1109/CVPR42600.2020.01164 http://dx.doi.org/10.1109/CVPR42600.2020.01164 ］

Casas S ， Sadat A and Urtasun R . 2021 . MP3： a unified model to map， perceive， predict and plan // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 14398 - 14407 ［ DOI： 10.1109/CVPR46437.2021.01417 http://dx.doi.org/10.1109/CVPR46437.2021.01417 ］

Chekroun R ， Toromanoff M ， Hornauer S and Moutarde F . 2023 . GRI： general reinforced imitation and its application to vision-based autonomous driving . Robotics ， 12 （ 5 ）： # 127 ［ DOI： 10.3390/robotics12050127 http://dx.doi.org/10.3390/robotics12050127 ］

Chen D and Krähenbühl P . 2022 . Learning from all vehicles // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， USA ： IEEE： 17201 - 17210 ［ DOI： 10.1109/CVPR52688.2022.01671 http://dx.doi.org/10.1109/CVPR52688.2022.01671 ］

Chen D ， Zhou B ， Koltun V and Krähenbühl P . 2020 . Learning by cheating // Proceedings of the 3rd Conference on Robot Learning . Osaka， Japan ：［s.n.］： 66 - 75

Chen L ， Hu X M ， Tang B and Cheng Y . 2022 . Conditional DQN-based motion planning with fuzzy logic for autonomous driving . IEEE Transactions on Intelligent Transportation Systems ， 23 （ 4 ）： 2966 - 2977 ［ DOI： 10.1109/TITS.2020.3025671 http://dx.doi.org/10.1109/TITS.2020.3025671 ］

Chen L ， Wu P H ， Chitta K ， Jaeger B ， Geiger A and Li H Y . 2024 . End-to-end autonomous driving： challenges and frontiers . IEEE Transactions on Pattern Analysis and Machine Intelligence . ［ DOI： 10.1109/TPAMI.2024.3435937 http://dx.doi.org/10.1109/TPAMI.2024.3435937 ］

Cheng R ， Agia C ， Shkurti F ， Meger D and Dudek G . 2021 . Latent attention augmentation for robust autonomous driving policies // Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS） . Prague， Czech Republic ： IEEE： 130 - 136 ［ DOI： 10.1109/IROS51168.2021.9636449 http://dx.doi.org/10.1109/IROS51168.2021.9636449 ］

Chib P S and Singh P . 2023 . Recent advancements in end-to-end autonomous driving using deep learning： a survey . IEEE Transactions on Intelligent Vehicles ， 9 （ 1 ）： 103 - 118 ［ DOI： 10.1109/TIV.2023.3318070 http://dx.doi.org/10.1109/TIV.2023.3318070 ］

Chitta K ， Prakash A and Geiger A . 2021 . NEAT： neural attention fields for end-to-end autonomous driving // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 15773 - 15783 ［ DOI： 10.1109/ICCV48922.2021.01550 http://dx.doi.org/10.1109/ICCV48922.2021.01550 ］

Chitta K ， Prakash A ， Jaeger B ， Yu Z H ， Renz K and Geiger A . 2023 . TransFuser： imitation with transformer-based sensor fusion for autonomous driving . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 45 （ 11 ）： 12878 - 12895 ［ DOI： 10.1109/TPAMI.2022.3200245 http://dx.doi.org/10.1109/TPAMI.2022.3200245 ］

Codevilla F ， Müller M ， López A ， Koltun V and Dosovitskiy A . 2018 . End-to-end driving via conditional imitation learning // Proceedings of 2018 IEEE International Conference on Robotics and Automation （ICRA） . Brisbane， Australia ： IEEE： 4693 - 4700 ［ DOI： 10.1109/ICRA.2018.8460487 http://dx.doi.org/10.1109/ICRA.2018.8460487 ］

Codevilla F ， Santana E ， López A M and Gaidon A . 2019 . Exploring the limitations of behavior cloning for autonomous driving // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 9328 - 9337 ［ DOI： 10.1109/ICCV.2019.00942 http://dx.doi.org/10.1109/ICCV.2019.00942 ］

Da F and Zhang Y . 2022 . Path-aware graph attention for HD maps in motion prediction // Proceedings of 2022 International Conference on Robotics and Automation （ICRA） . Philadelphia， USA ： IEEE： 6430 - 6436 ［ DOI： 10.1109/ICRA46639.2022.9812100 http://dx.doi.org/10.1109/ICRA46639.2022.9812100 ］

Dong H ， Zhang X J ， Jiang X ， Zhang J ， Xu J T ， Ai R ， Gu W H ， Lu H M ， Kannala J and Chen X Y L . 2022 . SuperFusion： multilevel LiDAR-camera fusion for long-range HD map generation and prediction ［EB/OL］. ［ 2023-11-07 ］. https://ar5iv.labs.arxiv.org/html/2211.15656 https://ar5iv.labs.arxiv.org/html/2211.15656

Dosovitskiy A ， Ros G ， Codevilla F ， López A and Koltun V . 2017 . CARLA： an open urban driving simulator // Proceedings of the 1st Annual Conference on Robot Learning . Mountain View， United States ：［s.n.］： 1 - 16

Gao J Y ， Sun C ， Zhao H ， Shen Y ， Anguelov D ， Li C C and Schmid C . 2020 . VectorNet： encoding HD maps and agent dynamics from vectorized representation // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 11523 - 11530 ［ DOI： 10.1109/CVPR42600.2020.01154 http://dx.doi.org/10.1109/CVPR42600.2020.01154 ］

Geiger A ， Lenz P ， Stiller C and Urtasun R . 2013 . Vision meets robotics： the KITTI dataset . The International Journal of Robotics Research ， 32 （ 11 ）： 1231 - 1237 ［ DOI： 10.1177/0278364913491297 http://dx.doi.org/10.1177/0278364913491297 ］

Gu J R ， Hu C X ， Zhang T Y ， Chen X Y ， Wang Y L ， Wang Y and Zhao H . 2023 . ViP3D： end-to-end visual trajectory prediction via 3D agent queries // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 5496 - 5506 ［ DOI： 10.1109/CVPR52729.2023.00532 http://dx.doi.org/10.1109/CVPR52729.2023.00532 ］

Gu J R ， Sun C and Zhao H . 2021 . DenseTNT： end-to-end trajectory prediction from dense goal sets // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 15283 - 15292 ［ DOI： 10.1109/ICCV48922.2021.01502 http://dx.doi.org/10.1109/ICCV48922.2021.01502 ］

Hecker S ， Dai D X and Van Gool L . 2018 . End-to-end learning of driving models with surround-view cameras and route planners // Proceedings of the 15th European Conference on Computer Vision （ECCV） . Munich， Germany ： IEEE： 449 - 468 ［ DOI： 10.1007/978-3-030-01234-2_27 http://dx.doi.org/10.1007/978-3-030-01234-2_27 ］

Ho J and Ermon S . 2016 . Generative adversarial imitation learning // Proceedings of the 30th International Conference on Neural Information Processing Systems . Barcelona， Spain ： Curran Associates Inc.： 4572 - 4580

Hu A ， Murez Z ， Mohan N ， Dudas S ， Hawke J ， Badrinarayanan V ， Cipolla R and Kendall A . 2021 . FIERY： future instance prediction in bird’s-eye view from surround monocular cameras // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 15253 - 15262 ［ DOI： 10.1109/ICCV48922.2021.01499 http://dx.doi.org/10.1109/ICCV48922.2021.01499 ］

Hu S C ， Chen L ， Wu P H ， Li H Y ， Yan J C and Tao D C . 2022a . ST-P3： end-to-end vision-based autonomous driving via spatial-temporal feature learning // Proceedings of the 17th European Conference on Computer Vision （ECCV） . Tel Aviv， Israel ： IEEE： 533 - 549 ［ DOI： 10.1007/978-3-031-19839-7_31 http://dx.doi.org/10.1007/978-3-031-19839-7_31 ］

Hu Y H ， Shao W X ， Jiang B ， Chen J J ， Chai S Q ， Yang Z N ， Qian J Y ， Zhou H L and Liu Q . 2022b . HOPE： hierarchical spatial-temporal network for occupancy flow prediction ［EB/OL］. ［ 2024-08-16 ］. https://arxiv.org/pdf/2206.10118.pdf https://arxiv.org/pdf/2206.10118.pdf

Hu Y H ， Yang J Z ， Chen L ， Li K Y ， Sima C H ， Zhu X Z ， Chai S Q ， Du S Y ， Lin T W ， Wang W H ， Lu L W ， Jia X S ， Liu Q ， Dai J F ， Qiao Y and Li H Y . 2023 . Planning-oriented autonomous driving // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 17853 - 17862 ［ DOI： 10.1109/CVPR52729.2023.01712 http://dx.doi.org/10.1109/CVPR52729.2023.01712 ］

Huang J J ， Huang G ， Zhu Z ， Ye Y and Du D L . 2022a . BEVDet： high-performance multi-camera 3D object detection in bird-eye-view ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/2112.11790.pdf https://arxiv.org/pdf/2112.11790.pdf

Huang T T ， Liu Z ， Chen X W and Bai X . 2020 . EPNet： enhancing point features with image semantics for 3d object detection // Proceedings of the 16th European Conference on Computer Vision （ECCV） . Glasgow， UK ： IEEE： 35 - 52 ［ DOI： 10.1007/978-3-030-58555-6_3 http://dx.doi.org/10.1007/978-3-030-58555-6_3 ］

Huang X ， Tian X Y ， Gu J R ， Sun Q and Zhao H . 2022b . VectorFlow： combining images and vectors for traffic occupancy and flow prediction ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/2208.04530.pdf https://arxiv.org/pdf/2208.04530.pdf

Jaeger B ， Chitta K and Geiger A . 2023 . Hidden biases of end-to-end driving models ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/2306.07957.pdf https://arxiv.org/pdf/2306.07957.pdf

Jiang B ， Chen S Y ， Xu Q ， Liao B C ， Chen J J ， Zhou H L ， Zhang Q ， Liu W Y ， Huang C and Wang X G . 2023 . VAD： vectorized scene representation for efficient autonomous driving ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/2303.12077.pdf https://arxiv.org/pdf/2303.12077.pdf

Kendall A ， Hawke J ， Janz D ， Mazur P ， Reda D ， Allen J M ， Lam V D ， Bewley A and Shah A . 2019 . Learning to drive in a day // Proceedings of 2019 International Conference on Robotics and Automation （ICRA） . Montreal， Canada ： IEEE： 8248 - 8254 ［ DOI： 10.1109/ICRA.2019.8793742 http://dx.doi.org/10.1109/ICRA.2019.8793742 ］

Lang A H ， Vora S ， Caesar H ， Zhou L B ， Yang J and Beijbom O . 2019 . PointPillars： fast encoders for object detection from point clouds // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 12689 - 12697 ［ DOI： 10.1109/CVPR.2019.01298 http://dx.doi.org/10.1109/CVPR.2019.01298 ］

Lee G ， Kim D ， Oh W ， Lee K and Oh S . 2020 . MixGAIL： autonomous driving using demonstrations with mixed qualities // Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS） . Las Vegas， USA ： IEEE： 5425 - 5430 ［ DOI： 10.1109/IROS45743.2020.9341104 http://dx.doi.org/10.1109/IROS45743.2020.9341104 ］

Li Q Y ， Peng Z H and Zhou B L . 2022a . Efficient learning of safe driving policy via human-AI copilot optimization ［EB/OL］. ［ 2024-08-16 ］. https://arxiv.org/pdf/2202.10341.pdf https://arxiv.org/pdf/2202.10341.pdf

Li S B ， Liu C ， Yin Y M ， Duan J L ， Wang J Q and Li K Q . 2023 . Key technologies and development trends of end-to-end autonomous driving system . AI-View ，（ 5 ）： 1 - 16

李升波，刘畅，殷玉明，段京良，王建强，李克强 . 2023 . 汽车端到端自动驾驶系统的关键技术与发展趋势 . 人工智能，（ 5 ）： 1 - 16 ［ DOI： 10.16453/j.2096-5036.202350 http://dx.doi.org/10.16453/j.2096-5036.202350 ］

Li X Y ， Ye Z H ， Wei S K ， Chen Z ， Chen X T ， Tian Y H ， Dang J W ， Fu S J and Zhao Y . 2023 . 3D object detection for autonomous driving from image： a survey——benchmarks， constraints and error analysis . Journal of Image and Graphics ， 28 （ 6 ）： 1709 - 1740

李熙莹，叶芝桧，韦世奎，陈泽，陈小彤，田永鸿，党建武，付树军，赵耀 . 2023 . 基于图像的自动驾驶3D目标检测综述——基准、制约因素和误差分析 . 中国图象图形学报， 28 （ 6 ）： 1709 - 1740 ［ DOI： 10.11834/jig.230036 http://dx.doi.org/10.11834/jig.230036 ］

Li Y Z ， Song J M and Ermon S . 2017 . InfoGAIL： interpretable imitation learning from visual demonstrations // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 3815 - 3825

Li Z Q ， Wang W H ， Li H Y ， Xie E Z ， Sima C H ， Lu T ， Qiao Y and Dai J F . 2022b . BEVFormer： learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 1 - 18 ［ DOI： 10.1007/978-3-031-20077-9_1 http://dx.doi.org/10.1007/978-3-031-20077-9_1 ］

Liang M ， Yang B ， Hu R ， Chen Y ， Liao R J ， Feng S and Urtasun R . 2020a . Learning lane graph representations for motion forecasting // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 541 - 556 ［ DOI： 10.1007/978-3-030-58536-5_32 http://dx.doi.org/10.1007/978-3-030-58536-5_32 ］

Liang M ， Yang B ， Zeng W Y ， Chen Y ， Hu R ， Casas S and Urtasun R . 2020b . PnPNet： end-to-end perception and prediction with tracking in the loop // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 11550 - 11559 ［ DOI： 10.1109/CVPR42600.2020.01157 http://dx.doi.org/10.1109/CVPR42600.2020.01157 ］

Liang T T ， Xie H W ， Yu K C ， Xia Z Y ， Lin Z W ， Wang Y T ， Tang T ， Wang B and Tang Z . 2022 . BEVFusion： a simple and robust LiDAR-camera fusion framework // 2022 Advances in Neural Information Processing Systems 3 . New York， America ： Curran Associates： 10421 - 10434

Liang X D ， Wang T R ， Yang L N and Xing E . 2018 . CIRL： controllable imitative reinforcement learning for vision-based self-driving // Proceedings of the 15th European Conference on Computer Vision . Munich， Germany ： IEEE： 604 - 620 ［ DOI： 10.1007/978-3-030-01234-2_36 http://dx.doi.org/10.1007/978-3-030-01234-2_36 ］

Lillicrap T P ， Hunt J J ， Pritzel A ， Heess N ， Erez T ， Tassa Y ， Silver D and Wierstra D . 2019 . Continuous control with deep reinforcement learning //Proceedings of the 4th International Conference on Learning Representations （ICLR）［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/1509.02971 https://arxiv.org/pdf/1509.02971

Liu Y F ， Hu X M ， Chen G W ， Liu S H and Chen L . 2021 . Review of end-to-end motion planning for autonomous driving with visual perception . Journal of Image and Graphics ， 26 （ 1 ）： 49 - 66

刘旖菲，胡学敏，陈国文，刘士豪，陈龙 . 2021 . 视觉感知的端到端自动驾驶运动规划综述 . 中国图象图形学报， 26 （ 1 ）： 49 - 66 ［ DOI： 10.11834/jig.200276 http://dx.doi.org/10.11834/jig.200276 ］

Mnih V ， Kavukcuoglu K ， Silver D ， Rusu A A ， Veness J ， Bellemare M G ， Graves A ， Riedmiller M ， Fidjeland A K ， Ostrovski G ， Petersen S ， Beattie C ， Sadik A ， Antonoglou I ， King H ， Kumaran D ， Wierstra D ， Legg S and Hassabis D . 2015 . Human-level control through deep reinforcement learning . Nature ， 518 （ 7540 ）： 529 - 533 ［ DOI： 10.1038/nature14236 http://dx.doi.org/10.1038/nature14236 ］

Ohn-Bar E ， Prakash A ， Behl A ， Chitta K and Geiger A . 2020 . Learning situational driving // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 11293 - 11302 ［ DOI： 10.1109/CVPR42600.2020.01131 http://dx.doi.org/10.1109/CVPR42600.2020.01131 ］

Osiński B ， Jakubowski A ， Zięcina P ， Miłoś P ， Galias C ， Homoceanu S and Michalewski H . 2020 . Simulation-based reinforcement learning for real-world autonomous driving // Proceedings of 2020 IEEE International Conference on Robotics and Automation （ICRA） . Paris， France ： IEEE： 6411 - 6418 ［ DOI： 10.1109/ICRA40945.2020.9196730 http://dx.doi.org/10.1109/ICRA40945.2020.9196730 ］

Pan F and Bao H . 2021 . Research progress of automatic driving control technology based on reinforcement learning . Journal of Image and Graphics ， 26 （ 1 ）： 28 - 35

潘峰，鲍泓 . 2021 . 强化学习的自动驾驶控制技术研究进展 . 中国图象图形学报， 26 （ 1 ）： 28 - 35 ［ DOI： 10.11834/jig.200428 http://dx.doi.org/10.11834/jig.200428 ］

Pang S ， Morris D and Radha H . 2020 . CLOCs： camera-LiDAR object candidates fusion for 3D object detection // Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS） . Las Vegas， USA ： IEEE： 10386 - 10393 ［ DOI： 10.1109/IROS45743.2020.9341791 http://dx.doi.org/10.1109/IROS45743.2020.9341791 ］

Park J ， Xu C F ， Yang S J ， Keutzer K ， Kitani K ， Tomizuka M and Zhan W . 2022 . Time will tell： new outlooks and a baseline for temporal multi-view 3D object detection ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/2210.02443.pdf https://arxiv.org/pdf/2210.02443.pdf

Philion J and Fidler S . 2020 . Lift， splat， shoot： encoding images from arbitrary camera rigs by implicitly unprojecting to 3D // Proceedings of the 16th European Conference on Computer Vision （ECCV） . Glasgow， UK ： Springer： 194 - 210 ［ DOI： 10.1007/978-3-030-58568-6_12 http://dx.doi.org/10.1007/978-3-030-58568-6_12 ］

Pomerleau D A . 1988 . ALVINN： an autonomous land vehicle in a neural network // Proceedings of the 1st International Conference on Neural Information Processing Systems . Cambridge， USA ： MIT Press： 305 - 313

Prakash A ， Behl A ， Ohn-Bar E ， Chitta K and Geiger A . 2020 . Exploring data aggregation in policy learning for vision-based urban autonomous driving // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 11760 - 11770 ［ DOI： 10.1109/CVPR42600.2020.01178 http://dx.doi.org/10.1109/CVPR42600.2020.01178 ］

Prakash A ， Chitta K and Geiger A . 2021 . Multi-modal fusion transformer for end-to-end autonomous driving // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 7073 - 7083 ［ DOI： 10.1109/CVPR46437.2021.00700 http://dx.doi.org/10.1109/CVPR46437.2021.00700 ］

Qi C R ， Yi L ， Su H and Guibas L J . 2017 . PointNet++： deep hierarchical feature learning on point sets in a metric space // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 5105 - 5114

Renz K ， Chitta K ， Mercea O B ， Koepke A S ， Akata Z and Geiger A . 2022 . PlanT： explainable planning transformers via object-level representations ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/2210.14222.pdf https://arxiv.org/pdf/2210.14222.pdf

Riedmiller M ， Montemerlo M and Dahlkamp H . 2007 . Learning to drive a real car in 20 minutes // Proceedings of 2007 Frontiers in the Convergence of Bioscience and Information Technologies . Jeju， Korea （South）： IEEE： 645 - 650 ［ DOI： 10.1109/FBIT.2007.37 http://dx.doi.org/10.1109/FBIT.2007.37 ］

Ross S ， Gordon G J and Bagnell J A . 2011 . A reduction of imitation learning and structured prediction to no-regret online learning ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/1011.0686 https://arxiv.org/pdf/1011.0686

Sadat A ， Casas S ， Ren M Y ， Wu X Y ， Dhawan P and Urtasun R . 2020 . Perceive， predict， and plan： safe motion planning through interpretable semantic representations // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 414 - 430 ［ DOI： 10.1007/978-3-030-58592-1_25 http://dx.doi.org/10.1007/978-3-030-58592-1_25 ］

Sadigh D ， Sastry S ， Seshia S A and Dragan A D . 2016 . Planning for autonomous cars that leverage effects on human actions // Robotics： Science and Systems . Cambridge， America ： MIT Press： 1 - 9 ［ DOI： 10.15607/RSS.2016.XII.029 http://dx.doi.org/10.15607/RSS.2016.XII.029 ］

Shao H ， Wang L T ， Chen R B ， Li H S and Liu Y . 2023a . Safety-enhanced autonomous driving using interpretable sensor fusion transformer // Proceedings of the 6th Conference on Robot Learning . Auckland， New Zealand ：［s.n.］： 726 - 737

Shao H ， Wang L T ， Chen R B ， Waslander S L ， Li H S and Liu Y . 2023b . ReasonNet： end-to-end driving with temporal and global reasoning // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Vancouver， Canada ： IEEE： 13723 - 13733 ［ DOI： 10.1109/CVPR52729.2023.01319 http://dx.doi.org/10.1109/CVPR52729.2023.01319 ］

Shi S S ， Wang X G and Li H S . 2019 . PointRCNN： 3D object proposal generation and detection from point cloud // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 770 - 779 ［ DOI： 10.1109/CVPR.2019.00086 http://dx.doi.org/10.1109/CVPR.2019.00086 ］

Sun P ， Kretzschmar H ， Dotiwalla X ， Chouard A ， Patnaik V ， Tsui P ， Guo J ， Zhou Y ， Chai Y N ， Caine B ， Vasudevan V ， Han W ， Ngiam J ， Zhao H ， Timofeev A ， Ettinger S ， Krivokon M ， Gao A ， Joshi A ， Zhang Y ， Shlens J ， Chen Z F and Anguelov D . 2020 . Scalability in perception for autonomous driving： Waymo open dataset // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 2443 - 2451 ［ DOI： 10.1109/CVPR42600.2020.00252 http://dx.doi.org/10.1109/CVPR42600.2020.00252 ］

Tampuu A ， Matiisen T ， Semikin M ， Fishman D and Muhammad N . 2022 . A survey of end-to-end driving： architectures and training methods . IEEE Transactions on Neural Networks and Learning Systems ， 33 （ 4 ）： 1364 - 1384 ［ DOI： 10.1109/TNNLS.2020.3043505 http://dx.doi.org/10.1109/TNNLS.2020.3043505 ］

Toromanoff M ， Wirbel E and Moutarde F . 2020 . End-to-end model-free reinforcement learning for urban driving using implicit affordances // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 7151 - 7160 ［ DOI： 10.1109/CVPR42600.2020.00718 http://dx.doi.org/10.1109/CVPR42600.2020.00718 ］

Vora S ， Lang A H ， Helou B and Beijbom O . 2020 . PointPainting： sequential fusion for 3D object detection // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 4603 - 4611 ［ DOI： 10.1109/CVPR42600.2020.00466 http://dx.doi.org/10.1109/CVPR42600.2020.00466 ］

Wang H L ， Cai P D ， Fan R ， Sun Y X and Liu M . 2021 . End-to-end interactive prediction and planning with optical flow distillation for autonomous driving // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW） . Nashville， USA ： IEEE： 2229 - 2238 ［ DOI： 10.1109/CVPRW53098.2021.00252 http://dx.doi.org/10.1109/CVPRW53098.2021.00252 ］

Wen L H and Jo K H . 2021 . Fast and accurate 3D object detection for lidar-camera-based autonomous vehicles using one shared voxel-based backbone . IEEE Access ， 9 ： 22080 - 22089 ［ DOI： 10.1109/ACCESS.2021.3055491 http://dx.doi.org/10.1109/ACCESS.2021.3055491 ］

Wu P H ， Jia X S ， Chen L ， Yan J C ， Li H Y and Qiao Y . 2022 . Trajectory-guided control prediction for end-to-end autonomous driving： a simple yet strong baseline ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/2206.08129.pdf https://arxiv.org/pdf/2206.08129.pdf

Wu P X ， Chen S H and Metaxas D N . 2020 . MotionNet： joint perception and motion prediction for autonomous driving based on bird’s eye view maps // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Seattle， USA ： IEEE： 11382 - 11392 ［ DOI： 10.1109/CVPR42600.2020.01140 http://dx.doi.org/10.1109/CVPR42600.2020.01140 ］

Xu D F ， Anguelov D and Jain A . 2018 . PointFusion： deep sensor fusion for 3D bounding box estimation // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 244 - 253 ［ DOI： 10.1109/CVPR.2018.00033 http://dx.doi.org/10.1109/CVPR.2018.00033 ］

Xu H Z ， Gao Y ， Yu F and Darrell T . 2017 . End-to-end learning of driving models from large-scale video datasets // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu， USA ： IEEE： 3530 - 3538 ［ DOI： 10.1109/CVPR.2017.376 http://dx.doi.org/10.1109/CVPR.2017.376 ］

Yan Y ， Mao Y X and Li B . 2018 . SECOND： sparsely embedded convolutional detection . Sensors ， 18 （ 10 ）： # 3337 ［ DOI： 10.3390/s18103337 http://dx.doi.org/10.3390/s18103337 ］

Yang B ， Luo W J and Urtasun R . 2018a . PIXOR： real-time 3D object detection from point clouds // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 7652 - 7660 ［ DOI： 10.1109/CVPR.2018.00798 http://dx.doi.org/10.1109/CVPR.2018.00798 ］

Yang Z Y ， Zhang Y X ， Yu J ， Cai J J and Luo J B . 2018b . End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions // Proceedings of the 24th International Conference on Pattern Recognition （ICPR） . Beijing， China ： IEEE： 2289 - 2294 ［ DOI： 10.1109/ICPR.2018.8546189 http://dx.doi.org/10.1109/ICPR.2018.8546189 ］

Ye T J ， Jing W ， Hu C Y ， Huang S K ， Gao L P ， Li F Z ， Wang J K ， Guo K ， Xiao W C ， Mao W B ， Zheng H ， Li K ， Chen J B and Yu K C . 2023 . FusionAD： multi-modality fusion for prediction and planning tasks of autonomous driving ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/pdf/2308.01006.pdf https://arxiv.org/pdf/2308.01006.pdf

Zeng W Y ， Luo W J ， Suo S ， Sadat A ， Yang B ， Casas S and Urtasun R . 2019 . End-to-end interpretable neural motion planner // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 8652 - 8661 ［ DOI： 10.1109/CVPR.2019.00886 http://dx.doi.org/10.1109/CVPR.2019.00886 ］

Zhang X L ， Jiang Y ， Lu Y and Xu X . 2022 . Receding-horizon reinforcement learning approach for kinodynamic motion planning of autonomous vehicles . IEEE Transactions on Intelligent Vehicles ， 7 （ 3 ）： 556 - 568 ［ DOI： 10.1109/TIV.2022.3167271 http://dx.doi.org/10.1109/TIV.2022.3167271 ］

Zhang Z J ， Liniger A ， Dai D X ， Yu F and Van Gool L . 2021 . End-to-end urban driving by imitating a reinforcement learning coach // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 15202 - 15212 ［ DOI： 10.1109/ICCV48922.2021.01494 http://dx.doi.org/10.1109/ICCV48922.2021.01494 ］

Zhao H ， Gao J Y ， Lan T ， Sun C ， Sapp B ， Varadarajan B ， Shen Y ， Shen Y ， Chai Y N ， Schmid C ， Li C C and Anguelov D . 2021 . TNT： target-driven trajectory prediction // Proceedings of the 4th Conference on Robot Learning . Cambridge， USA ：［s.n.］： 895 - 904

Zhou Y and Tuzel O . 2018 . VoxelNet： end-to-end learning for point cloud based 3D object detection // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 4490 - 4499 ［ DOI： 10.1109/CVPR.2018.00472 http://dx.doi.org/10.1109/CVPR.2018.00472 ］

文章被引用时，请邮件提醒。

提交

面向计算机视觉的数据生成与应用研究进展

全栈全谱：医疗影像人工智能的探索与应用

自动驾驶中的三维目标检测算法研究综述

面向网联自动驾驶部署的车—路—无人机跨域协同技术

仿真到现实环境的自动驾驶决策技术综述