融合目标相似性和作用力的多目标跟踪

王凯; 戴芳; 郭文艳; 王军锋; 王小侠

doi:10.11834/jig.230340

图像分析和识别 | 浏览量 : 0 下载量: 15 CSCD: 0

PDF
导出
分享
收藏
专辑

融合目标相似性和作用力的多目标跟踪
Integrating similarity and interaction force between objects for multiple object tracking
2024年29卷第7期页码：1984-1997
纸质出版日期： 2024-07-16 ，
DOI： 10.11834/jig.230340
稿件说明：

移动端阅览

王凯，戴芳，郭文艳，王军锋，王小侠. 2024. 融合目标相似性和作用力的多目标跟踪. 中国图象图形学报， 29(07):1984-1997

Wang Kai， Dai Fang， Guo Wenyan， Wang Junfeng， Wang Xiaoxia. 2024. Integrating similarity and interaction force between objects for multiple object tracking. Journal of Image and Graphics， 29(07):1984-1997
王凯，戴芳，郭文艳，王军锋，王小侠. 2024. 融合目标相似性和作用力的多目标跟踪. 中国图象图形学报， 29(07):1984-1997 DOI： 10.11834/jig.230340.

Wang Kai， Dai Fang， Guo Wenyan， Wang Junfeng， Wang Xiaoxia. 2024. Integrating similarity and interaction force between objects for multiple object tracking. Journal of Image and Graphics， 29(07):1984-1997 DOI： 10.11834/jig.230340.

摘要

目的

多目标跟踪是计算机视觉一个重要的研究方向，为了解决多目标跟踪中错跟和漏跟导致跟踪精度低的问题，提出一种融合目标相似性和作用力的多目标跟踪算法。

方法

首先将多目标跟踪问题转化为一个最大后验概率问题，其次将最大后验概率问题映射到网络流中，利用最小代价流寻找最优路径，这样获得的最优路径就是目标轨迹。为了计算网络流中目标节点之间的代价，从以下两方面考虑：1）将目标的外观、运动和位置信息三者结合，计算目标间的相似度；2）考虑目标与目标的相互影响，参考社会力模型中个体之间的吸引力来计算目标节点之间的作用力。

结果

在MOT15、MOT16和MOT17共3个公开数据集进行实验评估并与12种方法进行比较，实验结果表明，本文算法在MOTA （multiple object tracking accuracy）、MT （mostly tracked tracklets）、ML （mostly lost tracklets）、FP （false positives）、FN （false negatives）等指标上明显优于OACDASM （online association by continuous-discrete appearance similarity measurement）、STURE （spatial-temporal mutual representation learning）、IQHMOT （identity-quantity harmonic multi-object tracking）和GCNNMatch （graph convolutional neural network match）等典型算法。在MOT15数据集中选取ETH-Bahnhof、TUD-Stadtmitte与PETS09-S2L1 3个视频序列进行消融实验，验证增加目标作用力之后的数据关联结果，消融实验结果表明，增加目标作用力之后可以改善目标跟踪的精度和其他指标，尤其在遮挡不明显的视频序列中。

结论

本文在目标多特征的基础之上增加目标节点间作用力，加强了目标间的数据关联，减少错跟的目标数量，有效地提高了目标跟踪的精度。

Abstract

Objective

In the field of computer vision， object tracking is a critical task. Currently， many different types of multi-object tracking algorithms have been proposed， which usually include the following steps： object detection， feature extraction， similarity calculation， data association， and ID assignment. In this process， the object in the video sequence is first detected and a rectangular box is drawn to label the specific object detected. Then， the features of each object are extracted， such as location and appearance features. Then， the similarity of the object is determined by calculating the probability that the object in the adjacent video frames is the same object. Finally， through data association， the objects belonging to the same object in adjacent frames are associated and an ID is assigned to each object precisely. This paper mainly focuses on the feature extraction and data association stage of the object， using combined features to represent the characteristics of the object and then increasing the interaction force between the objects for enhanced data association to address the problem of mistracking in object tracking and thus improve the accuracy of object tracking.

Method

First， the multi-object tracking problem is transformed into a maximum a posteriori probability problem. Second， the maximum a posteriori probability problem is mapped to the network flow and the minimum cost flow is used to find the optimal path. To calculate the cost between the object nodes in the network flow， we consider two aspects. First， we calculate the similarity between the objects by combining the appearance， motion， and position information of the objects. Second， we consider the interaction between objects and objects， referring to the attraction between individuals in the social force model to calculate the force between object nodes.

Result

The experimental evaluation on three public datasets MOT15， MOT16， and MOT17 and a comparison with the latest 12 methods show that the proposed algorithm performs well in multiple object tracking accuracy， mostly tracked tracklets， mostly lost tracklets， false positives， false negatives， and other indicators； these indicators are significantly better than those of online association by continuous-discrete appearance similarity measurement， spatial-temporal mutual representation learning， identity-quantity harmonic multi-object tracking， graph convolutional neural network match （GCNNMatch）， and other typical algorithms. Ablation experiments were carried out on three video sequences of TUD-Stadtmitte， ETH-Bahnhof， and PETS09-S2L1 in the MOT15 dataset to verify the data association results after increasing the object force. The ablation experimental results show that the object tracking accuracy and other indicators can be improved after increasing the object force， especially in video sequences where occlusion is not obvious.

Conclusion

In this paper， the force between the target nodes is added based on the target multi-feature， which strengthens the data association between the targets， reduces the number of misfollowed targets， and effectively improves the accuracy of target tracking.

关键词

多目标跟踪（MOT）最小代价流目标作用力目标相似性社会力模型

Keywords

multi-object tracking（MOT）minimum cost flowobject forceobject similaritysocial force model

references

Abdulghafoor N H and Abdullah H N. 2022. A novel real-time multiple objects detection and tracking framework for different challenges. Alexandria Engineering Journal， 61（12）： 9637-9647 ［DOI： 10.1016/j.aej.2022.02.068http://dx.doi.org/10.1016/j.aej.2022.02.068］

Ata-ur-Rehman， Naqvi S M， Mihaylova L and Chambers J A. 2014. Multi-target tracking by using particle filtering and a social force model//Proceedings of the 17th International Conference on Information Fusion （FUSION）. Salamanca， Spain： IEEE： 1-6

Berclaz J， Fleuret F， Türetken E and Fua P. 2011. Multiple object tracking using K-shortest paths optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence， 33（9）： 1806-1819 ［DOI： 10.1109/TPAMI.2011.21http://dx.doi.org/10.1109/TPAMI.2011.21］

Bergmann P， Meinhardt T and Leal-Taixé L. 2019. Tracking without bells and whistles//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 941-951 ［DOI： 10.1109/ICCV.2019.00103http://dx.doi.org/10.1109/ICCV.2019.00103］

Bewley A， Ge Z Y， Ott L， Ramos F and Upcroft B. 2016. Simple online and realtime tracking//Proceedings of 2016 IEEE International Conference on Image Processing （ICIP）. Phoenix， USA： IEEE： 3464-3468 ［DOI： 10.1109/ICIP.2016.7533003http://dx.doi.org/10.1109/ICIP.2016.7533003］

Bochinski E， Eiselein V and Sikora T. 2017. High-speed tracking-by-detection without using image information//Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance （AVSS）. Lecce， Italy： IEEE： 1-6 ［DOI： 10.1109/AVSS.2017.8078516http://dx.doi.org/10.1109/AVSS.2017.8078516］

Bombardelli F， Gül S and Hellge C. 2019. Compressed-domain video object tracking using markov random fields with graph cuts optimization//Proceedings of the 40th German Conference on Pattern Recognition. Stuttgart， Germany： Springer： 127-139 ［DOI： 10.1007/978-3-030-12939-2_10http://dx.doi.org/10.1007/978-3-030-12939-2_10］

Boragule A and Jeon M. 2017. Joint cost minimization for multi-object tracking//Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance （AVSS）. Lecce， Italy： IEEE： 1-6 ［DOI： 10.1109/AVSS.2017.8078481http://dx.doi.org/10.1109/AVSS.2017.8078481］

Brasó G and Leal-Taixé L. 2020. Learning a neural solver for multiple object tracking//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 6246-6256 ［DOI： 10.1109/CVPR42600.2020.00628http://dx.doi.org/10.1109/CVPR42600.2020.00628］

Ciaparrone G， S􀅡nchez F L， Tabik S， Troiano L， Tagliaferri R and Herrera F. 2020. Deep learning in video multi-object tracking： a survey. Neurocomputing， 381： 61-88 ［DOI： 10.1016/j.neucom.2019.11.023http://dx.doi.org/10.1016/j.neucom.2019.11.023］

Fang L and Yu F Q. 2020. Multi-object tracking based on adaptive online discriminative appearance learning and hierarchical association. Journal of Image and Graphics， 25（4）： 708-720

方岚，于凤芹. 2020. 自适应在线判别外观学习的分层关联多目标跟踪. 中国图象图形学报， 25（4）： 708-720 ［DOI： 10.11834/jig.190320http://dx.doi.org/10.11834/jig.190320］

Felzenszwalb P， McAllester D and Ramanan D. 2008. A discriminatively trained， multiscale， deformable part model//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage， USA： IEEE： 1-8 ［DOI： 10.1109/CVPR.2008.4587597http://dx.doi.org/10.1109/CVPR.2008.4587597］

Feng P M， Wang W W， Dlay S， Naqvi S M and Chambers J. 2017. Social force model-based MCMC-OCSVM particle PHD filter for multiple human tracking. IEEE Transactions on Multimedia， 19（4）： 725-739 ［DOI： 10.1109/TMM.2016.2638206http://dx.doi.org/10.1109/TMM.2016.2638206］

Han G， Yu X Y and Liu L. 2017. Robust multi-object tracking based on higher-order graph and min-cost flow network//Proceedings of the 4th International Conference on Systems and Informatics （ICSAI）. Hangzhou， China： IEEE： 484-490 ［DOI： 10.1109/ICSAI.2017.8248341http://dx.doi.org/10.1109/ICSAI.2017.8248341］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

He Y H， Wei X， Hong X P， Ke W and Gong Y H. 2022. Identity-quantity harmonic multi-object tracking. IEEE Transactions on Image Processing， 31： 2201-2215 ［DOI： 10.1109/TIP.2022.3154286http://dx.doi.org/10.1109/TIP.2022.3154286］

Helbing D and Moln􀅡r P. 1995. Social force model for pedestrian dynamics. Physical Review E， 51（5）： 4282-4286 ［DOI： 10.1103/PhysRevE.51.4282http://dx.doi.org/10.1103/PhysRevE.51.4282］

Henschel R， Zou Y Z and Rosenhahn B. 2019. Multiple people tracking using body and joint detections//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Long Beach， USA： IEEE： 770-779 ［DOI： 10.1109/CVPRW.2019.00105http://dx.doi.org/10.1109/CVPRW.2019.00105］

Hornakova A， Kaiser T， Swoboda P， Rolinek M， Rosenhahn B and Henschel R. 2021. Making higher order mot scalable： an efficient approximate solver for lifted disjoint paths//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 6310-6320 ［DOI： 10.1109/ICCV48922.2021.00627http://dx.doi.org/10.1109/ICCV48922.2021.00627］

Hu Y J， Zhang X， Li Y X and Tian R. 2020. Online multiple object tracking using single object tracker and maximum weight clique graph//2020 IEEE 22nd International Workshop on Multimedia Signal Processing （MMSP）. Tampere， Finland： IEEE： 1-6 ［DOI： 10.1109/MMSP48831.2020.9287090http://dx.doi.org/10.1109/MMSP48831.2020.9287090］

Krishanth K， Chen X， Tharmarasa R， Kirubarajan T and McDonald M. 2017. The social force PHD filter for tracking pedestrians. IEEE Transactions on Aerospace and Electronic Systems， 53（4）： 2045-2059 ［DOI： 10.1109/TAES.2017.2680718http://dx.doi.org/10.1109/TAES.2017.2680718］

Kuhn H W. 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly， 2（1/2）： 83-97 ［DOI： 10.1002/nav.3800020109http://dx.doi.org/10.1002/nav.3800020109］

Law H and Deng J. 2020. CornerNet： detecting objects as paired keypoints. International Journal of Computer Vision， 128（3）： 642-656 ［DOI： 10.1007/s11263-019-01204-1http://dx.doi.org/10.1007/s11263-019-01204-1］

Leal-Taixé L， Pons-Moll G and Rosenhahn B. 2011. Everybody needs somebody： modeling social and grouping behavior on a linear programming multiple people tracker//Proceedings of 2011 IEEE International Conference on Computer Vision Workshops （ICCV Workshops）. Barcelona， Spain： IEEE： 120-127 ［DOI： 10.1109/ICCVW.2011.6130233http://dx.doi.org/10.1109/ICCVW.2011.6130233］

Li H L， Dong Y S and Li X L. 2022. Online association by continuous-discrete appearance similarity measurement for multi-object tracking. Neurocomputing， 487： 86-98 ［DOI： 10.1016/j.neucom.2022.02.055http://dx.doi.org/10.1016/j.neucom.2022.02.055］

Li X， Wang K J， Wang W and Li Y. 2010. A multiple object tracking method using Kalman filter//Proceedings of 2010 IEEE International Conference on Information and Automation. Harbin， China： IEEE： 1862-1866 ［DOI： 10.1109/ICINFA.2010.5512258http://dx.doi.org/10.1109/ICINFA.2010.5512258］

Lin T Y， Doll􀅡r P， Girshick R， He K M， Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 936-944 ［DOI： 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106］

Liu P X， Li X F， Feng H Y and Fu Z Z. 2017. Multi-object tracking by virtual nodes added min-cost network flow//Proceedings of 2017 IEEE International Conference on Image Processing （ICIP）. Beijing， China： IEEE： 2577-2581 ［DOI： 10.1109/ICIP.2017.8296748http://dx.doi.org/10.1109/ICIP.2017.8296748］

Liu Y T， Li X S， Bai T X， Wang K F and Wang F Y. 2021. Multi-object tracking with hard-soft attention network and group-based cost minimization. Neurocomputing， 447： 80-91 ［DOI： 10.1016/j.neucom.2021.02.084http://dx.doi.org/10.1016/j.neucom.2021.02.084］

Mazzon R and Cavallaro A. 2013. Multi-camera tracking using a multi-goal social force model. Neurocomputing， 100： 41-50 ［DOI： 10.1016/j.neucom.2011.09.038http://dx.doi.org/10.1016/j.neucom.2011.09.038］

Mehran R， Oyama A and Shah M. 2009. Abnormal crowd behavior detection using social force model//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami， USA： IEEE： 935-942 ［DOI： 10.1109/CVPR.2009.5206641http://dx.doi.org/10.1109/CVPR.2009.5206641］

Milan A， Roth S and Schindler K. 2014. Continuous energy minimization for multitarget tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence， 36（1）： 58-72 ［DOI： 10.1109/TPAMI.2013.103http://dx.doi.org/10.1109/TPAMI.2013.103］

Milan A， Schindler K and Roth S. 2016. Multi-target tracking by discrete-continuous energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence， 38（10）： 2054-2068 ［DOI： 10.1109/TPAMI.2015.2505309http://dx.doi.org/10.1109/TPAMI.2015.2505309］

Okuma K， Taleghani A， De Freitas N， Little J J and Lowe D G. 2004. A boosted particle filter： multitarget detection and tracking//Proceedings of the 8th European Conference on Computer Vision. Prague， Czech Republic： Springer： 28-39 ［DOI： 10.1007/978-3-540-24670-1_3http://dx.doi.org/10.1007/978-3-540-24670-1_3］

Papakis I， Sarkar A and Karpatne A. 2020. GCNNMatch： graph convolutional neural networks for multi-object tracking via sinkhorn normalization ［EB/OL］. ［2023-06-05］. https://arxiv.org/pdf/2010.00067.pdfhttps://arxiv.org/pdf/2010.00067.pdf

Pirsiavash H， Ramanan D and Fowlkes C C. 2011. Globally-optimal greedy algorithms for tracking a variable number of objects//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Colorado Springs， USA： IEEE： 1201-1208 ［DOI： 10.1109/CVPR.2011.5995604http://dx.doi.org/10.1109/CVPR.2011.5995604］

Ren H L， Xu F， Zou F S， Jia K， Di P and Kang J. 2018. Multi-pedestrian tracking based on social forces//Proceedings of 2018 IEEE International Conference on Intelligence and Safety for Robotics （ISR）. Shenyang， China： IEEE： 527-532 ［DOI： 10.1109/IISR.2018.8535956http://dx.doi.org/10.1109/IISR.2018.8535956］

Ren S Q， He K M， Girshick R and Sun J. 2017. Faster R-CNN： towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（6）： 1137-1149 ［DOI： 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031］

Ren W H， Wang X C， Tian J D， Tang Y D and Chan A B. 2021. Tracking-by-counting： using network flows on crowd density maps for tracking multiple targets. IEEE Transactions on Image Processing， 30： 1439-1452 ［DOI： 10.1109/TIP.2020.3044219http://dx.doi.org/10.1109/TIP.2020.3044219］

Sadeghian A， Alahi A and Savarese S. 2017. Tracking the untrackable： learning to track multiple cues with long-term dependencies//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 300-311 ［DOI： 10.1109/ICCV.2017.41http://dx.doi.org/10.1109/ICCV.2017.41］

Stadler D and Beyerer J. 2021. Multi-pedestrian tracking with clusters//Proceedings of the 17th IEEE International Conference on Advanced Video and Signal Based Surveillance （AVSS）. Washington， USA： IEEE： 1-10 ［DOI： 10.1109/AVSS52988.2021.9663829http://dx.doi.org/10.1109/AVSS52988.2021.9663829］

Sun Z H， Chen J， Chao L， Ruan W J and Mukherjee M. 2021. A survey of multiple pedestrian tracking based on tracking-by-detection framework. IEEE Transactions on Circuits and Systems for Video Technology， 31（5）： 1819-1833 ［DOI： 10.1109/TCSVT.2020.3009717http://dx.doi.org/10.1109/TCSVT.2020.3009717］

Sun Z H， Chen J， Mukherjee M， Liang C， Ruan W J and Pan Z G. 2022. Online multiple object tracking based on fusing global and partial features. Neurocomputing， 470： 190-203 ［DOI： 10.1016/j.neucom.2021.10.107http://dx.doi.org/10.1016/j.neucom.2021.10.107］

Wang H D， Li Z Y， Li Y P， Nai K and Wen M. 2022. STURE： Spatial-temporal mutual representation learning for robust data association in online multi-object tracking. Computer Vision and Image Understanding， 220： #103433 ［DOI： 10.1016/j.cviu.2022.103433http://dx.doi.org/10.1016/j.cviu.2022.103433］

Wojke N， Bewley A and Paulus D. 2017. Simple online and realtime tracking with a deep association metric//Proceedings of 2017 IEEE International Conference on Image Processing. Beijing， China： IEEE： 3645-3649 ［DOI： 10.1109/ICIP.2017.8296962http://dx.doi.org/10.1109/ICIP.2017.8296962］

Xu Y H， Ŝep A， Ban Y T， Horaud R， Leal-Taixé L and Alameda-Pineda， X. 2020. How to train your deep multi-object tracker//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 6786-6795 ［DOI： 10.1109/CVPR42600.2020.00682http://dx.doi.org/10.1109/CVPR42600.2020.00682］

Yang F， Choi W and Lin Y Q. 2016. Exploit all the layers： fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 2129-2137 ［DOI： 10.1109/CVPR.2016.234http://dx.doi.org/10.1109/CVPR.2016.234］

Yue Y Y， Xu D， He K J and Zhang H. 2023. An adaptive occlusion-aware multiple targets tracking algorithm for low viewpoint. Journal of Image and Graphics， 28（2）： 441-457

乐应英，徐丹，贺康建，张浩. 2023. 低视点下遮挡自适应感知的多目标跟踪算法. 中国图象图形学报， 28（2）： 441-457 ［DOI： 10.11834/jig.210853http://dx.doi.org/10.11834/jig.210853］

Zhang L， Li Y and Nevatia R. 2008. Global data association for multi-object tracking using network flows//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage， USA： IEEE： 1-8 ［DOI： 10.1109/CVPR.2008.4587584http://dx.doi.org/10.1109/CVPR.2008.4587584］

Zhang Y F， Sun P Z， Jiang Y， Yu D D， Weng F C， Yuan Z H， Luo P， Liu W Y and Wang X G. 2022. ByteTrack： multi-object tracking by associating every detection box//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv， Israel： Springer： 1-21 ［DOI： 10.1007/978-3-031-20047-2_1http://dx.doi.org/10.1007/978-3-031-20047-2_1］

文章被引用时，请邮件提醒。

提交

暂无数据