基于单目视觉惯性的同步定位与地图构建方法综述

章国锋; 黄赣; 谢卫健; 陈丹鹏; 王楠; 刘浩敏; 鲍虎军

doi:10.11834/jig.230863

混合现实 | 浏览量 : 0 下载量: 26 CSCD: 0

PDF
导出
分享
收藏
专辑

基于单目视觉惯性的同步定位与地图构建方法综述
A review of monocular visual-inertial SLAM
2024年29卷第10期页码：2839-2858
纸质出版日期： 2024-10-16 ，
DOI： 10.11834/jig.230863
稿件说明：

移动端阅览

章国锋，黄赣，谢卫健，陈丹鹏，王楠，刘浩敏，鲍虎军. 2024. 基于单目视觉惯性的同步定位与地图构建方法综述. 中国图象图形学报， 29(10):2839-2858

Zhang Guofeng， Huang Gan， Xie Weijian， Chen Danpeng， Wang Nan， Liu Haomin， Bao Hujun. 2024. A review of monocular visual-inertial SLAM. Journal of Image and Graphics， 29(10):2839-2858
章国锋，黄赣，谢卫健，陈丹鹏，王楠，刘浩敏，鲍虎军. 2024. 基于单目视觉惯性的同步定位与地图构建方法综述. 中国图象图形学报， 29(10):2839-2858 DOI： 10.11834/jig.230863.

Zhang Guofeng， Huang Gan， Xie Weijian， Chen Danpeng， Wang Nan， Liu Haomin， Bao Hujun. 2024. A review of monocular visual-inertial SLAM. Journal of Image and Graphics， 29(10):2839-2858 DOI： 10.11834/jig.230863.

摘要

单目视觉惯性同步定位与地图构建（visual-inertial simultaneous localization and mapping，VI-SLAM）技术因具有硬件成本低、无需对外部环境进行布置等优点，得到了广泛关注，在过去的十多年里取得了长足的进步，涌现出诸多优秀的方法和系统。由于实际场景的复杂性，不同方法难免有各自的局限性。虽然已经有一些工作对VI-SLAM方法进行了综述和评测，但大多只针对经典的VI-SLAM方法，已不能充分反映最新的VI-SLAM技术发展现状。本文首先对基于单目VI-SLAM方法的基本原理进行阐述，然后对单目VI-SLAM方法进行分类分析。为了综合全面地对比不同方法之间的优劣势，本文特别选取3个公开数据集对代表性的单目VI-SLAM方法从多个维度上进行定量评测，全面系统地分析了各类方法在实际场景尤其是增强现实应用场景中的性能。实验结果表明，基于优化或滤波和优化相结合的方法一般在跟踪精度和鲁棒性上比基于滤波的方法有优势，直接法/半直接法在全局快门拍摄的情况下精度较高，但容易受卷帘快门和光照变化的影响，尤其是大场景下误差累积较快；结合深度学习可以提高极端情况下的鲁棒性。最后，针对深度学习与V-SLAM/VI-SLAM结合、多传感器融合以及端云协同这3个研究热点，对SLAM的发展趋势进行讨论和展望。

Abstract

Monocular visual-inertial simultaneous localization and mapping （VI-SLAM） is an important research topic in computer vision and robotics. It aims to estimate the pose （i.e.， the position and orientation） of the device in real-time using a monocular camera with an inertial sensor while constructing the map of the environment. With the rapid development of various fields， such as augmented/virtual reality （AR/VR）， robotics， and autonomous driving， monocular VI-SLAM has received widespread attention due to its advantages， including low hardware cost and no requirement for an external environment setup， among others. Over the past decade or so， monocular VI-SLAM has made significant progress and spawned many excellent methods and systems. However， because of the complexity of real-world scenarios， different methods have also shown distinct limitations. Although some works have reviewed and evaluated VI-SLAM methods， most of them only focus on classic methods， which cannot fully reflect the latest development status of VI-SLAM technology. Based on optimization type， VI-SLAM can be divided into filtering- and optimization-based methods. Filtering-based methods use filters to fuse observations from visual and inertial sensors， continuously updating the device’s state information for localization and mapping. Additionally， depending on whether visual data association （or feature matching） is performed separately， existing methods can be divided into indirect methods （or feature-based methods） and direct methods. Furthermore， with the development and widespread application of deep learning technology， researchers have started to incorporate deep learning methods into VI-SLAM to enhance robustness in extreme conditions or perform dense reconstruction. This paper first elaborates on the basic principles of monocular VI-SLAM methods and then classifies them analytically into direct and filtering-， optimization-， feature-， and deep learning-based methods. However， most of the existing datasets and benchmarks are focused on applications like autonomous driving and drones， mainly evaluating pose accuracy. Relatively few datasets have been specifically designed for AR.For a more comprehensive comparison of the advantages and disadvantages of different methods， we select three public datasets to quantitatively evaluate representative monocular VI-SLAM methods from multiple dimensions： the widely used EuRoC dataset， the ZJU-Sensetime dataset suitable for mobile platform AR applications， and the low cost and scalable frarnework to build localization benchmark（LSFB） dataset aimed at large-scale AR scenarios. Then， we supplemented the ZJU-Sensetime dataset with a more challenging set of sequences called sequences C to enhance the variety of data types and evaluation dimensions. This extended dataset is designed to evaluate the robustness of algorithms under extreme conditions such as pure rotation， planar motion， lighting changes， and dynamic scenes. Specifically， sequences C comprise eight sequences， labeled C0–C7. In the C0 sequence， the handheld device moves around a room， performing multiple pure rotational motions. The C1 sequence involves the device mounted on a stabilized gimbal and moves freely. In the C2 sequence， the device moves in a planar motion， maintaining a constant height. The C3 sequence includes turning lights on and off during recording. In the C4 sequence， the device overlooks the floor while moving. The C5 sequence captures the exterior wall with significant parallax and minimal co-visibility， while the C6 sequence involves viewing a monitor during recording， with slight movement and changing screen content. Finally， the C7 sequence involves long-distance recording. On the EuRoC dataset， both filtering- and optimization-based VI-SLAM methods achieved good accuracy. Multi-state constraint Kalman filter（MSCKF）， an early filtering-based system， showed lower accuracy and struggled with some sequences. Some methods such OpenVINS and RNIN-VIO enhanced accuracy by adding new features and deep learning-based algorithms， respectively. OKVIS， an early optimization-based system， completed all sequences but with lower accuracy. Other methods such as VINS-Mono， RD-VIO， and ORB-SLAM3 achieved significant optimizations， improving initialization， robustness， and overall accuracy. Direct methods such as DM-VIO and SVO-Pro， which we extended from DSO and SVO， respectively， showed significant improvements in accuracy through techniques like delayed marginalization and efficient use of texture information. Adaptive VIO， which is based on deep learning， achieved high accuracy by continuously updating through online learning， demonstrating adaptability to new scenarios. Furthermore， on the ZJU-Sensetime dataset， the comparison results of different methods are largely similar to those in EuRoC. The main difference is that the accuracy of the direct method DM-VIO significantly decreases when using a rolling shutter camera， whereas the semidirect method SVO-Pro has a slightly better performance. Feature-based methods do not show a significant drop in accuracy， but the smaller field of view （FoV） found in phone cameras reduces the robustness of ORB-SLAM3， Kimera， and MSCKF. Additionally， ORB-SLAM3 has high tracking accuracy but a lower completeness， while Kimera and MSCKF show increased tracking errors. HybVIO， RNIN-VIO， and RD-VIO have the highest accuracy， while HybVIO slightly outperforms the two others. The deep learning-based Adaptive VIO also shows a significant drop in accuracy and struggles to complete sequences B and C， indicating generalization and robustness issues in complex scenarios. On the LSFB dataset， the comparison results are consistent with those in small-scale datasets. The methods with the highest accuracy in small scenes， such as RNIN-VIO， HybVIO， and RD-VIO， continue to show high accuracy in large scenes. In particular， RNIN-VIO demonstrates even more significant accuracy advantages in large scenes. In large-scale scenes， many feature points are distant and lack parallax， leading to rapid accumulation of errors， especially in methods that are heavily rely on visual constraints. The neural inertial network-based RNIN-VIO can better maximize IMU observations， reducing dependence on visual data. The VINS-Mono also shows significant advantages in large scenes， as its sliding window optimization facilitating the early inclusion of small-parallax feature points， effectively controlling error accumulation. In contrast， ORB-SLAM3， which relies on local maps， requires sufficient parallax before adding feature points to the local map， which can lead to insufficient visual constraints in distant environments and ultimately cause error accumulation and even tracking loss. The experimental results also show that optimization-based or combined filtering–optimization methods generally outperform filtering-based methods in terms of tracking accuracy and robustness. At the same time， direct/semidirect methods perform well when shooting with a global shutter camera， but are prone to error accumulation， especially in large scenes when affected by rolling shutter and light changes. Combining deep learning can improve robustness in extreme situations. Finally， the development trend of SLAM is discussed and prospected in this work based on three research hotspots： combining deep learning with V-SLAM/VI-SLAM， multisensor fusion， and end-cloud collaboration.

关键词

视觉惯性同步定位与地图构建（VI-SLAM）增强现实（AR）视觉惯性数据集多视图几何多传感器融合

Keywords

visual-inertial SLAM（VI-SLAM）augmented reality（AR）visual-inertial datasetmultiple-view geometrymulti-sensor fusion

references

Almalioglu Y， Turan M， Saputra M R U， De Gusmão P P B， Markham A and Trigoni N. 2022. SelfVIO： self-supervised deep monocular visual-inertial odometry and depth estimation. Neural Networks， 150： 119-136 ［DOI： 10.1016/j.neunet.2022.03.005http://dx.doi.org/10.1016/j.neunet.2022.03.005］

Ba Y H， Gilbert A， Wang F， Yang J F， Chen R， Wang Y Q， Yan L， Shi B X and Kadambi A. 2020. Deep shape from polarization//Proceedings of 16th European Conference on Computer Vision-ECCV 2020. Glasgow， UK： Springer： 554-571 ［DOI： 10.1007/978-3-030-58586-0_33http://dx.doi.org/10.1007/978-3-030-58586-0_33］

Bao H J， Xie W J， Qian Q H， Chen D P， Zhai S J， Wang N and Zhang G F. 2022. Robust tightly-coupled visual-inertial odometry with pre-built maps in high latency situations. IEEE Transactions on Visualization and Computer Graphics， 28（5）： 2212-2222 ［DOI： 10.1109/TVCG.2022.3150495http://dx.doi.org/10.1109/TVCG.2022.3150495］

Bryner S， Gallego G， Rebecq H and Scaramuzza D. 2019. Event-based， direct camera tracking from a photometric 3D map using nonlinear optimization//Proceedings of 2019 International Conference on Robotics and Automation （ICRA）. Montreal， Canada： IEEE： 325-331 ［DOI： 10.1109/ICRA.2019.8794255http://dx.doi.org/10.1109/ICRA.2019.8794255］

Buchanan R， Agrawal V， Camurri M， Dellaert F and Fallon M. 2023. Deep IMU bias inference for robust visual-inertial odometry with factor graphs. IEEE Robotics and Automation Letters， 8（1）： 41-48 ［DOI： 10.1109/LRA.2022.3222956http://dx.doi.org/10.1109/LRA.2022.3222956］

Burri M， Nikolic J， Gohl P， Schneider T， Rehder J， Omari S， Achtelik M W and Siegwart R. 2016. The EuRoC micro aerial vehicle datasets. The International Journal of Robotics Research， 35（10）： 1157-1163 ［DOI： 10.1177/0278364915620033http://dx.doi.org/10.1177/0278364915620033］

Cadena C， Carlone L， Carrillo H， Latif Y， Scaramuzza D， Neira J， Reid I and Leonard J J. 2016. Past， present， and future of simultaneous localization and mapping： toward the robust-perception age. IEEE Transactions on Robotics， 32（6）： 1309-1332 ［DOI： 10.1109/TRO.2016.2624754http://dx.doi.org/10.1109/TRO.2016.2624754］

Campos C， Elvira R， Rodríguez J J G， Montiel J M M and Tardós J D. 2021. ORB-SLAM3： an accurate open-source library for visual， visual-inertial， and multimap SLAM. IEEE Transactions on Robotics， 37（6）： 1874-1890 ［DOI： 10.1109/TRO.2021.3075644http://dx.doi.org/10.1109/TRO.2021.3075644］

Chen C H， Lu X X， Markham A and Trigoni N. 2018. IONet： learning to cure the curse of drift in inertial odometry//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans， USA： AAAI： 6468-6476 ［DOI： 10.1609/aaai.v32i1.12102http://dx.doi.org/10.1609/aaai.v32i1.12102］

Chen D P， Wang N， Xu R S， Xie W J， Bao H J and Zhang G F. 2021b. RNIN-VIO： robust neural inertial navigation aided visual-inertial odometry in challenging scenes//2021 IEEE International Symposium on Mixed and Augmented Reality （ISMAR）. Bari， Italy： IEEE： 275-283 ［DOI： 10.1109/ISMAR52148.2021.00043http://dx.doi.org/10.1109/ISMAR52148.2021.00043］

Clark R， Wang S， Wen H K， Markham A and Trigoni N. 2017. VINet： visual-inertial odometry as a sequence-to-sequence learning problem//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco， USA： AAAI： 3995-4001 ［DOI： 10.1609/aaai.v31i1.11215http://dx.doi.org/10.1609/aaai.v31i1.11215］

Cortés S， Solin A， Rahtu E and Kannala J. 2018. ADVIO： an authentic dataset for visual-inertial odometry//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 425-440 ［DOI： 10.1007/978-3-030-01249-6_26http://dx.doi.org/10.1007/978-3-030-01249-6_26］

Cui Z P， Gu J W， Shi B X， Tan P and Kautz J. 2017. Polarimetric multi-view stereo//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 369-378 ［DOI： 10.1109/CVPR.2017.47http://dx.doi.org/10.1109/CVPR.2017.47］

Delmerico J and Scaramuzza D. 2018. A benchmark comparison of monocular visual-inertial odometry algorithms for flying robots//Proceedings of 2018 IEEE International Conference on Robotics and Automation. Brisbane， Australia： IEEE： 2502-2509 ［DOI： 10.1109/ICRA.2018.8460664http://dx.doi.org/10.1109/ICRA.2018.8460664］

Deng T C， Chen Y H， Zhang L Y， Yang J F， Yuan S H， Wang D W and Chen W D. 2024. Compact 3D Gaussian splatting for dense visual SLAM ［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2403.11247.pdfhttps://arxiv.org/pdf/2403.11247.pdf

DeTone D， Malisiewicz T and Rabinovich A. 2018. SuperPoint： self-supervised interest point detection and description//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City， USA： IEEE： 337-349 ［DOI： 10.1109/CVPRW.2018.00060http://dx.doi.org/10.1109/CVPRW.2018.00060］

Engel J， Koltun V and Cremers D. 2018. Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence， 40（3）： 611-625 ［DOI： 10.1109/TPAMI.2017.2658577http://dx.doi.org/10.1109/TPAMI.2017.2658577］

Faragher R and Harle R. 2015. Location fingerprinting with bluetooth low energy beacons. IEEE Journal on Selected Areas in Communications， 33（11）： 2418-2428 ［DOI： 10.1109/JSAC.2015.2430281http://dx.doi.org/10.1109/JSAC.2015.2430281］

Forster C， Pizzoli M and Scaramuzza D. 2014. SVO： fast semi-direct monocular visual odometry//Proceedings of 2014 IEEE International Conference on Robotics and Automation （ICRA）. Hong Kong， China： IEEE： 15-22 ［DOI： 10.1109/ICRA.2014.6906584http://dx.doi.org/10.1109/ICRA.2014.6906584］

Forster C， Zhang Z C， Gassner M， Werlberger M and Scaramuzza D. 2017a. SVO： semidirect visual odometry for monocular and multicamera systems. IEEE Transactions on Robotics， 33（2）： 249-265 ［DOI： 10.1109/TRO.2016.2623335http://dx.doi.org/10.1109/TRO.2016.2623335］

Forster C， Carlone L， Dellaert F and Scaramuzza D. 2017b. On-manifold preintegration for real-time visual： inertial odometry. IEEE Transactions on Robotics， 33（1）： 1-21 ［DOI： 10.1109/TRO.2016.2597321http://dx.doi.org/10.1109/TRO.2016.2597321］

Fu T M， Su S S， Lu Y R and Wang C. 2024. iSLAM： imperative SLAM［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2306.07894.pdfhttps://arxiv.org/pdf/2306.07894.pdf

G􀅡lvez-López D and Tardos J D. 2012. Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics， 28（5）： 1188-1197 ［DOI： 10.1109/TRO.2012.2197158http://dx.doi.org/10.1109/TRO.2012.2197158］

Geneva P， Eckenhoff K， Lee W， Yang Y L and Huang G Q. 2020. OpenVINS： a research platform for visual-inertial estimation//Proceedings of 2020 IEEE International Conference on Robotics and Automation. Paris， France： IEEE： 4666-4672 ［DOI： 10.1109/ICRA40945.2020.9196524http://dx.doi.org/10.1109/ICRA40945.2020.9196524］

Han L M， Lin Y M， Du G G and Lian S G. 2019. DeepVIO： self-supervised deep learning of monocular visual inertial odometry using 3D geometric constraints//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau， China： IEEE： 6906-6913 ［DOI： 10.1109/IROS40897.2019.8968467http://dx.doi.org/10.1109/IROS40897.2019.8968467］

Harris C and Stephens M. 1988. A combined corner and edge detector//Proceedings of Alvey Vision Conference 1988. Manchester， UK：［s.n.］： 147-152

Herath S， Yan H and Furukawa Y. 2020. RoNIN： robust neural inertial navigation in the wild： benchmark， evaluations， and new methods//Proceedings of 2020 IEEE International Conference on Robotics and Automation （ICRA）. Paris， France： IEEE： 3146-3152 ［DOI： 10.1109/ICRA40945.2020.9196860http://dx.doi.org/10.1109/ICRA40945.2020.9196860］

Hu J R， Chen X H， Feng B Y， Li G L， Yang L J， Bao H J， Zhang G F and Cui Z P. 2024. CG-SLAM： efficient dense RGB-D SLAM in a consistent uncertainty-aware 3D Gaussian field ［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2403.16095.pdfhttps://arxiv.org/pdf/2403.16095.pdf

Johari M M， Carta C and Fleuret F. 2023. ESLAM： efficient dense SLAM system based on hybrid representation of signed distance fields//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 17408-17419 ［DOI： 10.1109/CVPR52729.2023.01670http://dx.doi.org/10.1109/CVPR52729.2023.01670］

Katragadda S， Lee W， Peng Y X， Geneva P， Chen C C， Guo C， Li M Y and Huang G Q. 2023. NeRF-VINS： a real-time neural radiance field map-based visual-inertial navigation system［EB/OL］.［2023-12-08］. https://arxiv.org/pdf/2309.09295.pdfhttps://arxiv.org/pdf/2309.09295.pdf

Keetha N， Karhade J， Jatavallabhula K M， Yang G S， Scherer S， Ramanan D and Luiten J. 2024. SplaTAM： splat， track and map 3D Gaussians for dense RGB-D SLAM ［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2312.02126.pdfhttps://arxiv.org/pdf/2312.02126.pdf

Koestler L， Yang N， Zeller N and Cremers D. 2022. TANDEM： tracking and dense mapping in real-time using deep multi-view stereo//Proceedings of the 5th Conference on Robot Learning. London， UK：［s.n.］： 34-45

Leutenegger S， Chli M and Siegwart R Y. 2011. BRISK： binary robust invariant scalable keypoints//Proceedings of 2011 International Conference on Computer Vision. Barcelona， Spain： IEEE： 2548-2555 ［DOI： 10.1109/ICCV.2011.6126542http://dx.doi.org/10.1109/ICCV.2011.6126542］

Leutenegger S， Furgale P， Rabaud V， Chli M， Konolige K and Siegwart R. 2013. Keyframe-based visual-inertial SLAM using nonlinear optimization//Proceedings of the International Conference on Robotics： Science and Systems IX （RSS 2013. Berlin， Germany：［s.n.］）

Li J Y， Pan X K， Huang G， Zhang Z Y， Wang N， Bao H J and Zhang G F. 2024. RD-VIO： robust visual-inertial odometry for mobile augmented reality in dynamic environments. IEEE Transactions on Visualization and Computer Graphics： 1-14 ［DOI： 10.1109/TVCG.2024.3353263http://dx.doi.org/10.1109/TVCG.2024.3353263］

Li J Y， Yang B B， Chen D P， Wamg N， Zhang G F and Bao H J. 2019. Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality. Virtual Reality and Intelligent Hardware， 1（4）： 386-410 ［DOI： 10.1016/j.vrih.2019.07.002http://dx.doi.org/10.1016/j.vrih.2019.07.002］

Liu H M， Jiang M X， Zhang Z， Huang X P， Zhao L S， Hang M， Feng Y J， Bao H J and Zhang G F. 2020a. LSFB： a low-cost and scalable framework for building large-scale localization benchmark//2020 IEEE International Symposium on Mixed and Augmented Reality Adjunct （ISMAR-Adjunct）. Recife， Brazil： IEEE： 219-224 ［DOI： 10.1109/ISMAR-Adjunct51615.2020.00065http://dx.doi.org/10.1109/ISMAR-Adjunct51615.2020.00065］

Liu H M， Xue H， Zhao L S， Chen D P， Peng Z and Zhang G F. 2023. MagLoc-AR： magnetic-based localization for visual-free augmented reality in large-scale indoor environments. IEEE Transactions on Visualization and Computer Graphics， 29（11）： 4383-4393 ［DOI： 10.1109/TVCG.2023.3321088http://dx.doi.org/10.1109/TVCG.2023.3321088］

Liu H M， Zhang G F and Bao H J. 2016. A survey of monocular simultaneous localization and mapping. Journal of Computer-Aided Design and Computer Graphics， 28（6）： 855-868

刘浩敏，章国锋，鲍虎军. 2016. 基于单目视觉的同时定位与地图构建方法综述. 计算机辅助设计与图形学学报， 28（6）： 855-868 ［DOI： 10.3969/j.issn.1003-9775.2016.06.001http://dx.doi.org/10.3969/j.issn.1003-9775.2016.06.001］

Liu H M， Zhao L S， Peng Z， Xie W J， Jiang M X， Zha H， Bao H J and Zhang G F. 2024. A low-cost and scalable framework to build large-scale localization benchmark for augmented reality. IEEE Transactions on Circuits and Systems for Video Technology， 34（4）： 2274-2288 ［DOI： 10.1109/TCSVT.2023.3306160http://dx.doi.org/10.1109/TCSVT.2023.3306160］

Liu W X， Caruso D， Ilg E， Dong J， Mourikis A I， Daniilidis K， Kumar V and Engel J. 2020b. TLIO： tight learned inertial odometry. IEEE Robotics and Automation Letters， 5（4）： 5653-5660 ［DOI： 10.1109/LRA.2020.3007421http://dx.doi.org/10.1109/LRA.2020.3007421］

Matsuki H， Murai R， Kelly P H J and Davison A J. 2024. Gaussian splatting SLAM ［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2312.06741https://arxiv.org/pdf/2312.06741

Matsuki H， Scona R， Czarnowski J and Davison A J. 2021. CodeMapping： real-time dense mapping for sparse SLAM using compact scene representations. IEEE Robotics and Automation Letters， 6（4）： 7105-7112 ［DOI： 10.1109/LRA.2021.3097258http://dx.doi.org/10.1109/LRA.2021.3097258］

Mourikis A I and Roumeliotis S I. 2007. A multi-state constraint Kalman filter for vision-aided inertial navigation//Proceedings of 2007 IEEE International Conference on Robotics and Automation. Rome， Italy： IEEE： 3565-3572 ［DOI： 10.1109/ROBOT.2007.364024http://dx.doi.org/10.1109/ROBOT.2007.364024］

Pan X K， Liu H M， Fang M， Wang Z， Zhang Y and Zhang G F. 2023. Dynamic 3D scenario-oriented monocular SLAM based on semantic probability prediction. Journal of Image and Graphics， 28（7）： 2151-2166

潘小鹍，刘浩敏，方铭，王政，张涌，章国锋. 2023. 基于语义概率预测的动态场景单目视觉SLAM. 中国图象图形学报， 28（7）： 2151-2166 ［DOI： 10.11834/jig.210632http://dx.doi.org/10.11834/jig.210632］

Pan Y Q， Zhou W G， Cao Y D and Zha H B. 2024. Adaptive VIO： deep visual-inertial odometry with online continual learning［EB/OL］.［2023-12-08］. https://arxiv.org/pdf/2405.16754.pdfhttps://arxiv.org/pdf/2405.16754.pdf

Pfrommer B， Sanket N， Daniilidis K and Cleveland J. 2017. PennCOSYVIO： a challenging visual inertial odometry benchmark//Proceedings of 2017 IEEE International Conference on Robotics and Automation （ICRA）. Singapore， Singapore： IEEE： 3847-3854 ［DOI： 10.1109/ICRA.2017.7989443http://dx.doi.org/10.1109/ICRA.2017.7989443］

Platinsky L， Szabados M， Hlasek F， Hemsley R， Del Pero L， Pancik A， Baum B， Grimmett H and Ondruska P. 2020. Collaborative augmented reality on smartphones via life-long city-scale maps//Proceedings of 2020 IEEE International Symposium on Mixed and Augmented Reality （ISMAR）. Porto de Galinhas， Brazil： IEEE： 533-541 ［DOI： 10.1109/ISMAR50242.2020.00081http://dx.doi.org/10.1109/ISMAR50242.2020.00081］

Qin T， Cao S Z， Pan J and Shen S J. 2019a. A general optimization-based framework for global pose estimation with multiple sensors［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/1901.03642.pdfhttps://arxiv.org/pdf/1901.03642.pdf

Qin T， Li P L and Shen S J. 2018. VINS-Mono： a robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics， 34（4）： 1004-1020 ［DOI： 10.1109/TRO.2018.2853729http://dx.doi.org/10.1109/TRO.2018.2853729］

Qin T， Pan J， Cao S Z and Shen S J. 2019b. A general optimization-based framework for local odometry estimation with multiple sensors［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/1901.03638.pdfhttps://arxiv.org/pdf/1901.03638.pdf

Rosinol A， Abate M， Chang Y and Carlone L. 2020. Kimera： an open-source library for real-time metric-semantic localization and mapping//Proceedings of 2020 IEEE International Conference on Robotics and Automation （ICRA）. Paris， France： IEEE： 1689-1696 ［DOI： 10.1109/ICRA40945.2020.9196885http://dx.doi.org/10.1109/ICRA40945.2020.9196885］

Rosinol A， Leonard J J and Carlone L. 2022. NeRF-SLAM： real-time dense monocular SLAM with neural radiance fields ［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2210.13641.pdfhttps://arxiv.org/pdf/2210.13641.pdf

Sandström E， Li Y， Van Gool L and Oswald M R. 2023. Point-SLAM： dense neural point cloud-based SLAM//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris， France： IEEE： 18387-18398 ［DOI： 10.1109/ICCV51070.2023.01690http://dx.doi.org/10.1109/ICCV51070.2023.01690］

Sarlin P E， DeTone D， Malisiewicz T and Rabinovich A. 2020. SuperGlue： learning feature matching with graph neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 4937-4946 ［DOI： 10.1109/CVPR42600.2020.00499http://dx.doi.org/10.1109/CVPR42600.2020.00499］

Sarlin P E， Dusmanu M， Schönberger J L， Speciale P， Gruber L， Larsson V， Miksik O and Pollefeys M. 2022. LaMAR： benchmarking localization and mapping for augmented reality//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv， Israel： Springer： 686-704 ［DOI： 10.1007/978-3-031-20071-7_40http://dx.doi.org/10.1007/978-3-031-20071-7_40］

Schonberger J L and Frahm J M. 2016. Structure-from-motion revisited//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 4104-4113 ［DOI： 10.1109/CVPR.2016.445http://dx.doi.org/10.1109/CVPR.2016.445］

Seiskari O， Rantalankila P， Kannala J， Ylilammi J， Rahtu E and Solin A. 2022. HybVIO： pushing the limits of real-time visual-inertial odometry//Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa， USA： IEEE： 287-296 ［DOI： 10.1109/WACV51458.2022.00036http://dx.doi.org/10.1109/WACV51458.2022.00036］

Servières M， Renaudin V， Dupuis A and Antigny N. 2021. Visual and visual-inertial SLAM： state of the art， classification， and experimental benchmarking. Journal of Sensors， 2021： #2054828 ［DOI： 10.1155/2021/2054828http://dx.doi.org/10.1155/2021/2054828］

Sucar E， Liu S K， Ortiz J and Daviso A J. 2021. iMAP： implicit mapping and positioning in real-time//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 6209-6218 ［DOI： 10.1109/ICCV48922.2021.00617http://dx.doi.org/10.1109/ICCV48922.2021.00617］

Sun J M， Shen Z H， Wang Y A， Bao H J and Zhou X W. 2021. LoFTR： detector-free local feature matching with Transformers//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 8922-8931 ［DOI： 10.1109/CVPR46437.2021.00881http://dx.doi.org/10.1109/CVPR46437.2021.00881］

Sun L C， Bhatt N P， Liu J C， Fan Z W， Wang Z Y， Humphreys T E and Topcu U. 2024b. MM3DGS SLAM： multi-modal 3D Gaussian splatting for SLAM using vision， depth， and inertial measurements［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2404.00923.pdfhttps://arxiv.org/pdf/2404.00923.pdf

Sun S， Mielle M， Liienthal A J and Magnusson M. 2024a. High-fidelity SLAM using Gaussian splatting with rendering-guided densification and regularized optimization ［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2403.12535.pdfhttps://arxiv.org/pdf/2403.12535.pdf

Tan W， Liu H M， Dong Z L， Zhang G F and Bao H J. 2013. Robust monocular SLAM in dynamic environments//2013 IEEE International Symposium on Mixed and Augmented Reality （ISMAR）. Adelaide， Australia： IEEE： 209-218 ［DOI： 10.1109/ISMAR.2013.6671781http://dx.doi.org/10.1109/ISMAR.2013.6671781］

Tang C Z and Tan P. 2019. BA-Net： dense bundle adjustment network［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/1806.04807.pdfhttps://arxiv.org/pdf/1806.04807.pdf

Teed Z and Deng J. 2020. RAFT： recurrent all-pairs field transforms for optical flow//Proceedings of the 16th European Conference on Computer Vision—ECCV 2020. Glasgow， UK： Springer： 402-419 ［DOI： 10.1007/978-3-030-58536-5_24http://dx.doi.org/10.1007/978-3-030-58536-5_24］

Teed Z and Deng J. 2021. DROID-SLAM： deep visual SLAM for monocular， stereo， and RGB-D cameras//Proceedings of the 35th International Conference on Neural Information Processing Systems. Virtual： Curran Associates Inc.： 16558-16569

Teed Z， Lipson L and Deng J. 2023. Deep patch visual odometry ［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2208.04726.pdfhttps://arxiv.org/pdf/2208.04726.pdf

Von Stumberg L and Cremers D. 2022. DM-VIO： delayed marginalization visual-inertial odometry. IEEE Robotics and Automation Letters， 7（2）： 1408-1415 ［DOI： 10.1109/LRA.2021.3140129http://dx.doi.org/10.1109/LRA.2021.3140129］

Von Stumberg L， Usenko V and Cremers D. 2018. Direct sparse visual-inertial odometry using dynamic marginalization//Proceedings of 2018 IEEE International Conference on Robotics and Automation （ICRA）. Brisbane， Australia： IEEE： 2510-2517 ［DOI： 10.1109/ICRA.2018.8462905http://dx.doi.org/10.1109/ICRA.2018.8462905］

Wang H Y， Wang J W and Agapito L. 2023. Co-SLAM： joint coordinate and sparse parametric encodings for neural real-time SLAM//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 13293-13302 ［DOI： 10.1109/CVPR52729.2023.01277http://dx.doi.org/10.1109/CVPR52729.2023.01277］

Wang J K， Zuo X X， Zhao X R， Lyu J J and Liu Y. 2022. Review of multi-source fusion SLAM： current status and challenges. Journal of Image and Graphics， 27（2）： 368-389

王金科，左星星，赵祥瑞，吕佳俊，刘勇. 2022. 多源融合SLAM的现状与挑战. 中国图象图形学报， 27（2）： 368-389 ［DOI： 10.11834/jig.210547http://dx.doi.org/10.11834/jig.210547］

Wenzel P， Yang N， Wang R， Zeller N and Cremers D. 2022. 4Seasons： benchmarking visual SLAM and long-term localization for autonomous driving in challenging conditions ［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2301.01147.pdfhttps://arxiv.org/pdf/2301.01147.pdf

Wu K J， Ahmed A M， Georgiou G A and Roumeliotis S. 2015. A square root inverse filter for efficient vision-aided inertial navigation on mobile devices//Robotics： Science and Systems. Rome， Italy： MIT Press

Xie W J， Chu G Y， Qian Q H， Yu Y H， Li H， Chen D P， Zhai S J， Wang N， Bao H J and Zhang G F. 2023. Depth completion with multiple balanced bases and confidence for dense monocular SLAM［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2309.04145https://arxiv.org/pdf/2309.04145

Yamaguchi M， Mori S， Saito H， Yachida S and Shibata T. 2020. Global-map-registered local visual odometry using on-the-fly pose graph updates//Proceedings of the 7th International Conference on Augmented Reality， Virtual Reality， and Computer Graphics. Lecce， Italy： Springer： 299-311 ［DOI： 10.1007/978-3-030-58465-8_23http://dx.doi.org/10.1007/978-3-030-58465-8_23］

Yan C， Qu D， Xu D， Zhao B， Wang Z， Wang D and Li X. 2024. GS-SLAM： dense visual SLAM with 3D Gaussian splatting//Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， WA， USA： 19595-19604 ［DOI：10.48550/arXiv.2311.11700http://dx.doi.org/10.48550/arXiv.2311.11700］

Yan S， Liu Y， Wang L， Shen Z H， Peng Z， Liu H M， Zhang M J， Zhang G F and Zhou X W. 2023. Long-term visual localization with mobile sensors//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 17245-17255 ［DOI： 10.1109/CVPR52729.2023.01654http://dx.doi.org/10.1109/CVPR52729.2023.01654］

Yang L W， Tan F T， Li A， Cui Z P， Furukawa Y and Tan P. 2018. Polarimetric dense monocular SLAM//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 3857-3866 ［DOI： 10.1109/CVPR.2018.00406http://dx.doi.org/10.1109/CVPR.2018.00406］

Yang X R， Li H， Zhai H J， Ming Y H， Liu Y Q and Zhang G F. 2022. Vox-Fusion： dense tracking and mapping with voxel-based neural implicit representation//Proceedings of 2022 IEEE International Symposium on Mixed and Augmented Reality （ISMAR）. Singapore， Singapore： IEEE： 499-507 ［DOI： 10.1109/ISMAR55827.2022.00066http://dx.doi.org/10.1109/ISMAR55827.2022.00066］

Ye H Y， Huang H Y and Liu M. 2020. Monocular direct sparse localization in a prior 3D surfel map//Proceedings of 2020 IEEE International Conference on Robotics and Automation （ICRA）. Paris， France： IEEE： 8892-8898 ［DOI： 10.1109/ICRA40945.2020.9197022http://dx.doi.org/10.1109/ICRA40945.2020.9197022］

Ye Z C， Bao C， Zhou X， Liu H M， Bao H J and Zhang G F. 2024. EC-SfM： efficient covisibility-based structure-from-motion for both sequential and unordered images. IEEE Transactions on Circuits and Systems for Video Technology， 34（1）： 110-123 ［DOI： 10.1109/TCSVT.2023.3285479http://dx.doi.org/10.1109/TCSVT.2023.3285479］

Youssef M and Agrawala A. 2005. The Horus WLAN location determination system//Proceedings of the 3rd International Conference on Mobile Systems， Applications， and Services. Seattle， USA： ACM： 205-218 ［DOI： 10.1145/1067170.1067193http://dx.doi.org/10.1145/1067170.1067193］

Yu H L， Ye W C， Feng Y J， Bao H J and Zhang G F. 2020. Learning bipartite graph matching for robust visual localization//2020 IEEE International Symposium on Mixed and Augmented Reality （ISMAR）. Porto de Galinhas， Brazil： IEEE： 146-155 ［DOI： 10.1109/ISMAR50242.2020.00036http://dx.doi.org/10.1109/ISMAR50242.2020.00036］

Zuo X， Merrill N， Li W， Liu Y， Pollefeys M and Huang G. 2021. CodeVIO： visual-inertial odometry with learned optimizable dense depth//Proceedings of 2021 IEEE International Conference on Robotics and Automation（ICRA）. Xi’an， China： IEEE： 14382-14388 ［DOI：10.1109/ICRA48506.2021.9560792http://dx.doi.org/10.1109/ICRA48506.2021.9560792］

Zeng Q H， Luo Y X， Sun K C， Li Y N and Liu J Y. 2022. Review on SLAM technology development for vision and its fusion of inertial information. Journal of Nanjing University of Aeronautics and Astronautics， 54（6）： 1007-1020

曾庆化，罗怡雪，孙克诚，李一能，刘建业. 2022. 视觉及其融合惯性的SLAM技术发展综述. 南京航空航天大学学报， 54（6）： 1007-1020 ［DOI： 10.16356/j.1005-2615.2022.06.002http://dx.doi.org/10.16356/j.1005-2615.2022.06.002］

Zhang M， Zhang M M， Chen Y M and Li M Y. 2021. IMU data processing for inertial aided navigation： a recurrent neural network based approach//Proceedings of 2021 IEEE International Conference on Robotics and Automation （ICRA）. Xi’an， China： IEEE： 3992-3998 ［DOI： 10.1109/ICRA48506.2021.9561172http://dx.doi.org/10.1109/ICRA48506.2021.9561172］

Zhao M H， Chang T， Arun A， Ayyalasomayajula R， Zhang C and Bharadia D. 2021. ULoc： low-power， scalable and cm-accurate UWB-tag localization and tracking for indoor applications. Proceedings of the ACM on Interactive， Mobile， Wearable and Ubiquitous Technologies， 5（3）： #140 ［DOI： 10.1145/3478124http://dx.doi.org/10.1145/3478124］

Zhou Y， Gallego G and Shen S J. 2021. Event-based stereo visual odometry. IEEE Transactions on Robotics， 37（5）： 1433-1450 ［DOI： 10.1109/TRO.2021.3062252http://dx.doi.org/10.1109/TRO.2021.3062252］

Zhu A Z， Atanasov N and Daniilidis K. 2017. Event-based visual inertial odometry//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 5816-5824 ［DOI： 10.1109/CVPR.2017.616http://dx.doi.org/10.1109/CVPR.2017.616］

Zhu Z， Peng S， Larsson V， Cui Z， Oswald M R， Geiger A and Pollefeys M. 2023. NICER-SLAM： neural implicit scene encoding for RGB SLAM ［EB/OL］. ［2023-12-08］. https://arxiv.org/pdf/2302.03594https://arxiv.org/pdf/2302.03594

Zhu Z H， Peng S Y， Larsson V， Xu W W， Bao H J， Cui Z P， Oswald M R and Pollefeys M. 2022. NICE-SLAM： neural implicit scalable encoding for SLAM//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 12776-12786 ［DOI： 10.1109/CVPR52688.2022.01245http://dx.doi.org/10.1109/CVPR52688.2022.01245］

文章被引用时，请邮件提醒。

提交