自适应IoU损失和层级关联的多目标跟踪

郭文; 刘其贵; 丁昕苗

doi:10.11834/jig.230390

图像分析和识别 | 浏览量 : 0 下载量: 329 CSCD: 0

PDF
导出
分享
收藏
专辑

自适应IoU损失和层级关联的多目标跟踪
Multi-object tracking using adaptive-IoU loss and hierarchical association
2024年29卷第7期页码：1970-1983
收稿日期：2023-06-20，

修回日期：2023-10-17，

纸质出版日期：2024-07-16
DOI： 10.11834/jig.230390
稿件说明：

移动端阅览

郭文，刘其贵，丁昕苗. 2024. 自适应IoU损失和层级关联的多目标跟踪. 中国图象图形学报， 29(07):1970-1983 DOI： 10.11834/jig.230390.

Guo Wen， Liu Qigui， Ding Xinmiao. 2024. Multi-object tracking using adaptive-IoU loss and hierarchical association. Journal of Image and Graphics， 29(07):1970-1983 DOI： 10.11834/jig.230390.

摘要

目的

针对模糊行人特征造成身份切换的问题和复杂场景下目标之间遮挡造成跟踪精度降低的问题，提出AIoU-Tracker多目标跟踪算法。

方法

首先根据骨干网络检测头设计了一个特殊的AIoU（adaptive intersection over union）回归损失函数，从重叠面积、中心点距离和纵横比3个方面去衡量，缓解了由于模糊行人特征判别性不足造成的身份切换现象；其次提出了一种简单有效的层级（hierarchical）关联策略，在高分检测框和低分检测框分别关联之后，充分利用关联失败检测框周围的嵌入信息再次进行关联，提高了在遮挡条件下多目标跟踪的关联精度。

结果

通过一系列的对比实验，提出的AIoU-Tracker跟踪方法相比于FairMOT跟踪方法在MOT16数据集上，HOTA（higher order tracking accuracy）值由58.3%提高至59.8%，IDF1（ID F1 score）值由72.6%提高至73.1%，MOTA（multi-object tracking accuracy）值由69.3%提高至74.4%；在MOT17数据集上，HOTA值由59.3%提高至59.9%，IDF1值由72.3%提高至72.9%。

结论

本文提出的特征平衡性跟踪方法，使边界框大小特征、热图特征和中心点偏移量特征在训练测试中达到了更好的平衡，使多目标跟踪结果更加准确。

Abstract

Objective

Multiple object tracking （MOT） is a mainstream task in computer vision， which aims mainly to estimate the tracklets of multiple objects in videos and has important applications in the fields of autonomous driving， human-computer interaction， and human activity recognition. A large number of methods focus on improving the tracking performance based on the given detection results. Re-ID based trackers can be divided into two categories： separate detection and embedding （SDE） tracking models and joint detection and embedding （JDE） tracking models. The SDE tracking model tunes the detection model and the Re-ID model separately to optimize the model， but this leads to the disadvantage of the SDE tracking model being unable to perform real-time detection. The JDE tracking model performs object detection while outputting the object location and appearance embedding information for the next step of object association， thus improving the algorithm’s operational speed. However， the JDE tracking method suffers from the problem of identity switching due to ambiguous pedestrian features and the degradation of tracking accuracy due to occlusion between objects in complex scenes. An adaptive intersection-over-union （AIoU）-tracker multi-object tracking algorithm is proposed to address these issues.

Method

First， we utilize the backbone network detection head to design a special AIoU regression loss function that measures the overlap area， center point distance， and aspect ratio. This approach helps alleviate the problem caused by identity switching due to ambiguous pedestrian features. Second， we propose a simple and effective hierarchical association method to leverage the embedding information around association failure detection frames for Re-ID. The high-score detection frames and low-score detection frames are associated separately， improving the association accuracy of multi-object tracking under occlusion conditions. We utilize a variant of the DLA-34 network architecture as the backbone network. The model parameters are trained on the common objects in context （COCO） dataset and used to initialize the model. The experiments are conducted on a system running Ubuntu 16.04 with 64 GB of memory and a GTX2080Ti GPU.

The software configuration includes CUDA 10.2. We train the model using the Adam optimizer for 30 epochs， with an initial learning rate of 10

-4

. The learning rate is decayed to 10

-5

after 20 epochs， and the batch size is set to 16. We apply standard data augmentation techniques， including rotation， scaling， and color jittering. The input image size is adjusted to 1 088 × 608 pixels， and the feature map resolution is set to 272 × 152 pixels. We evaluate our approach on the MOT Challenge benchmark， specifically the MOT16 and the MOT17 datasets. The experiments utilize various datasets， including CrowdHuman， MIX dataset （ETH， CityPerson， CUHKSYSU， Caltech， and PRW）. The ETH and CityPerson datasets only provide bounding box annotations， so we only train the detection branch on these datasets. The Caltech， MOT17， CUHKSYSU， and PRW datasets provide both bounding box positions and ID annotations， allowing for training of both branches. To ensure a fair comparison， we remove the overlapping videos between the ETH dataset and the MOT17 test dataset. The CrowdHuman dataset only contains bounding box annotations， so we perform self-supervised training on it. To evaluate the tracking performance， we use several well-defined metrics， including higher-order tracking accuracy （HOTA）， multi-object tracking accuracy （MOTA）， ID F1 score （IDF1）， false positive， false negative， and number of identity switches （IDs）. MOTA primarily assesses the performance of the detection branch， IDF1 evaluates identity preservation， focusing on the association performance， and HOTA provides a comprehensive evaluation of both the detection branch and the data association performance.

Result

The performance of our method is compared with that of existing methods on two datasets. The comparison results are as follows： 1） our HOTA value is 59.8% on the MOT16 dataset， which is increased by 1.5% compared with the FairMOT. Our MOTA value is 74.4% on the MOT16 dataset， which is increased by 5.1% compared with the FairMOT. Our IDF1 value is 73.1% on the MOT16 dataset， which is increased by 0.5% compared with the FairMOT. 2） The HOTA value is 59.9% on the MOT17 dataset， which is increased by 0.6% compared with the FairMOT. The IDF1 value is 72.9% on the MOT17 dataset， which is increased by 1.6% compared with the FairMOT. In addition， we conduct ablation studies on the MOT17 dataset to verify the effectiveness of different components in our method， which demonstrates that the proposed method significantly outperforms the competition in multiple object tracking. In the ablation studies， we observe a decrease in the number of identity switches through the added AIoU regression loss function. We also visualize the predicted Re-ID feature extraction positions， bounding box size feature， heat map feature， and center point offset feature. The visualization results show that our method is more robust than FairMOT. Moreover， our hierarchical association method makes the association more robust. For example， even after two frames， obscured IDs can still be associated.

Conclusion

The proposed feature balancing tracking method achieves better balance among the bounding box size feature， heat map feature， and center point offset feature during training and testing， resulting in more accurate multi-object tracking results. In this study， we propose two improvement measures for the FairMOT framework. First， we design an AIoU regression loss module to optimize the detection branch， enabling it to optimize targets based on the current optimal distance and extract more accurate appearance features. Second， we optimize the Re-ID branch through a hierarchical association strategy module， utilizing three-level matching to enhance the tracking system’s association performance. Experimental results demonstrate significant improvements on the MOT17 dataset， with HOTA increasing to 59.9%， IDF1 increasing to 72.9%， and MOTA increasing to 70.8%. However， a competition issue exists between the detection and Re-ID branches in the JDE tracking model， which can lead to a decrease in MOTA. Future research will focus on investigating this competition in the JDE tracking model.

关键词

Keywords

references

Bewley A ， Ge Z Y ， Ott L ， Ramos F and Upcroft B . 2016 . Simple online and realtime tracking // Proceedings of 2016 IEEE International Conference on Image Processing （ICIP） . Phoenix， USA ： IEEE： 3464 - 3468 ［ DOI： 10.1109/ICIP.2016.7533003 http://dx.doi.org/10.1109/ICIP.2016.7533003 ］

Bochinski E ， Eiselein V and Sikora T . 2017 . High-speed tracking-by-detection without using image information // Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance （AVSS） . Lecce， Italy ： IEEE： 1 - 6 ［ DOI： 10.1109/AVSS.2017.8078516 http://dx.doi.org/10.1109/AVSS.2017.8078516 ］

Cai J R ， Xu M Z ， Li W ， Xiong Y J ， Xia W ， Tu Z W and Soatto S . 2022 . MeMOT： multi-object tracking with memory // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . New Orleans， USA ： IEEE： 8080 - 8090 ［ DOI： 10.1109/CVPR52688.2022.00792 http://dx.doi.org/10.1109/CVPR52688.2022.00792 ］

Chan S X ， Jia Y W ， Zhou X L ， Bai C ， Chen S Y and Zhang X Q . 2022 . Online multiple object tracking using joint detection and embedding network . Pattern Recognition ， 130 ： # 108793 ［ DOI： 10.1016/j.patcog.2022.108793 http://dx.doi.org/10.1016/j.patcog.2022.108793 ］

Doll􀅡r P ， Wojek C ， Schiele B and Perona P . 2009 . Pedestrian detection： a benchmark // Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition . Miami， USA ： IEEE： 304 - 311 ［ DOI： 10.1109/CVPR.2009.5206631 http://dx.doi.org/10.1109/CVPR.2009.5206631 ］

Duan K W ， Bai S ， Xie L X ， Qi H G ， Huang Q M and Tian Q . 2019 . CenterNet： keypoint triplets for object detection // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV） . Seoul， Korea （South）： IEEE： 6568 - 6577 ［ DOI： 10.1109/ICCV.2019.00667 http://dx.doi.org/10.1109/ICCV.2019.00667 ］

Ess A ， Leibe B ， Schindler K and van Gool L . 2008 . A mobile vision system for robust multi-person tracking // Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition . Anchorage， USA ： IEEE： 1 - 8 ［ DOI： 10.1109/CVPR.2008.4587581 http://dx.doi.org/10.1109/CVPR.2008.4587581 ］

Han S ， Huang P ， Wang H ， Yu E ， Liu D and Pan X . 2022 . Mat： motion-aware multi-object tracking . Neurocomputing ， 476 ： 75 - 86 ［ DOI： 10.1016/j.neucom.2021.12.104 http://dx.doi.org/10.1016/j.neucom.2021.12.104 ］

He K M ， Gkioxari G ， Dollar P and Girshick R . 2017 . Mask R-CNN // Proceedings of 2017 IEEE International Conference on Computer Vision （ICCV） . Venice， Italy ： IEEE： 2980 - 2988 ［ DOI： 10.1109/ICCV.2017.322 http://dx.doi.org/10.1109/ICCV.2017.322 ］

Huang G ， Liu S C ， van der Maaten L and Weinberger K Q . 2018 . CondenseNet： an efficient DenseNet using learned group convolutions // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， USA ： IEEE： 2752 - 2761 ［ DOI： 10.1109/CVPR.2018.00291 http://dx.doi.org/10.1109/CVPR.2018.00291 ］

Li W ， Zhao R ， Xiao T and Wang X G . 2014 . DeepReID： deep filter pairing neural network for person re-identification // Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus， USA ： IEEE： 152 - 159 ［ DOI： 10.1109/CVPR.2014.27 http://dx.doi.org/10.1109/CVPR.2014.27 ］

Liang C ， Zhang Z P ， Zhou X ， Li B ， Zhu S Y and Hu W M . 2022 . Rethinking the competition between detection and ReID in multiobject tracking . IEEE Transactions on Image Processing ， 31 ： 3182 - 3196 ［ DOI： 10.1109/ TIP.2022.3165376 http://dx.doi.org/10.1109/TIP.2022.3165376 ］

Lin T Y ， Doll􀅡r P ， Girshick R ， He K M ， Hariharan B and Belongie S . 2017 . Feature pyramid networks for object detection // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu， USA ： IEEE： 936 - 944 ［ DOI： 10.1109/ CVPR.2017.106 http://dx.doi.org/10.1109/CVPR.2017.106 ］

Luo W H ， Xing J L ， Milan A ， Zhang X Q ， Liu W and Kim T K . 2021 . Multiple object tracking： a literature review . Artificial Intelligence ， 293 ： # 103448 ［ DOI： 10.1016 /j.artint.2020.103448 http://dx.doi.org/10.1016/j.artint.2020.103448 ］

Milan A ， Leal-Taixe L ， Reid I ， Roth S and Schindler K . 2016 . MOT16： a benchmark for multi-object tracking ［EB/OL］. ［ 2023-11-01 ］. https://arxiv.org/pdf/1603.00831.pdf https://arxiv.org/pdf/1603.00831.pdf

Park Y ， Dang L M ， Lee S ， Han D and Moon H . 2021 . Multiple object tracking in deep learning approaches： a survey . Electronics ， 10 （ 19 ）： # 2406 ［ DOI： 10.3390/ electronics10192406 http://dx.doi.org/10.3390/electronics10192406 ］

Peng J L ， Wang C G ， Wan F B ， Wu Y ， Wang Y B ， Tai Y ， Wang C J ， Li J L ， Huang F Y and Fu Y W . 2020 . Chained-tracker： chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 145 - 161 ［ DOI： 10.1007/978-3-030-58548-8_ 9 http://dx.doi.org/10.1007/978-3-030-58548-8_9 ］

Ren S Q ， He K M ， Girshick R and Sun J . 2017 . Faster R-CNN： towards real-time object detection with region proposal networks . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 39 （ 6 ）： 1137 - 1149 ［ DOI： 10.1109/TPAMI.2016.2577031 http://dx.doi.org/10.1109/TPAMI.2016.2577031 ］

Rezatofighi H ， Tsoi N ， Gwak J ， Sadeghian A ， Reid I and Savarese S . 2019 . Generalized intersection over union： a metric and a loss for bounding box regression // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 658 - 666 ［ DOI： 10.1109/CVPR.2019.00075 http://dx.doi.org/10.1109/CVPR.2019.00075 ］

Shan C B ， Wei C B ， Deng B ， Huang J Q ， Hua X S ， Cheng X L and Liang K W . 2020 . Tracklets predicting based adaptive graph tracking ［EB/OL］. ［ 2023-11-01 ］. https:// arxiv.org/pdf/2010.09015.pdf https://arxiv.org/pdf/2010.09015.pdf

Shao S ， Zhao Z J ， Li B X ， Xiao T T ， Yu G ， Zhang X Y and Sun J . 2018 . CrowdHuman： a benchmark for detecting human in a crowd ［EB/OL］. ［ 2023-11-01 ］. https://arxiv.org/pdf/1805.00123.pdf https://arxiv.org/pdf/1805.00123.pdf

Sun P Z ， Cao J K ， Jiang Y ， Zhang R F ， Xie E Z ， Yuan Z H ， Wang C H and Luo P . 2021a . TransTrack： multiple object tracking with Transformer ［EB/OL］. ［ 2023-11-01 ］. https://arxiv.org /pdf/2012.15460.pdf https://arxiv.org/pdf/2012.15460.pdf

Tokmakov P ， Li J ， Burgard W and Gaidon A . 2021 . Learning to track with object permanence // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . Montreal， Canada ： IEEE： 10840 - 10849 ［ DOI： 10.1109/ICCV48922.2021.01068 http://dx.doi.org/10.1109/ICCV48922.2021.01068 ］

Voigtlaender P ， Krause M ， Osep A ， Luiten J ， Sekar B B G ， Geiger A and Leibe B . 2019 . MOTS： multi-object tracking and segmentation // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Long Beach， USA ： IEEE： 7934 - 7943 ［ DOI： 10.1109/CVPR.2019.00813 http://dx.doi.org/10.1109/CVPR.2019.00813 ］

Wang Y X ， Kitani K and Weng X S . 2021 . Joint object detection and multi-object tracking with graph neural networks // Proceedings of 2021 IEEE International Conference on Robotics and Automation （ICRA） . Xi’an， China ： IEEE： 13708 - 13715 ［ DOI： 10.1109/ICRA4850 6.2021.9561110 http://dx.doi.org/10.1109/ICRA48506.2021.9561110 ］

Wang Z D ， Zheng L ， Liu Y X ， Li Y L and Wang S J . 2020 . Towards real-time multi-object tracking // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 107 - 122 ［ DOI： 10.1007/978-3- 030-58621-8_7 http://dx.doi.org/10.1007/978-3-030-58621-8_7 ］

Wojke N ， Bewley A and Paulus D . 2017 . Simple online and realtime tracking with a deep association metric // Proceedings of 2017 IEEE International Conference on Image Processing （ICIP） . Beijing， China ： IEEE： 3645 - 3649 ［ DOI： 10.1109/ICIP.2017.8296962 http://dx.doi.org/10.1109/ICIP.2017.8296962 ］

Xiao T ， Li S ， Wang B C ， Lin L and Wang X G . 2017 . Joint detection and identification feature learning for person search // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu， USA ： IEEE： 3376 - 3385 ［ DOI： 10.1109/CVPR.2017.360 http://dx.doi.org/10.1109/CVPR.2017.360 ］

Xu Y H ， Ban Y T ， Delorme G ， Gan C ， Rus D and Alameda-Pineda X . 2023 . TransCenter： Transformers with dense representations for multiple-object tracking . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 45 （ 6 ）： 7820 - 7835 ［ DOI： 10.1109/TPAM I.2022.3225078 http://dx.doi.org/10.1109/TPAMI.2022.3225078 ］

Yang F ， Chang X ， Sakti S ， Wu Y and Nakamura S . 2021 . ReMOT： a model-agnostic refinement for multiple object tracking . Image and Vision Computing ， 106 ： # 104091 ［ DOI： 10.1016/j.imavis.2020.104091 http://dx.doi.org/10.1016/j.imavis.2020.104091 ］

Yue Y Y ， Xu D ， He K J and Zhang H . 2023 . An adaptive occlusion-aware multiple targets tracking algorithm for low viewpoint . Journal of Image and Graphics ， 28 （ 2 ）： 441 - 457

乐应英，徐丹，贺康建，张浩 . 2023 . 低视点下遮挡自适应感知的多目标跟踪算法 . 中国图象图形学报， 28 （ 2 ）： 441 - 457 ［ DOI： 10.11834/jig.210853 http://dx.doi.org/10.11834/jig.210853 ］

Zeng F G ， Dong B ， Zhang Y A ， Wang T C ， Zhang X Y and Wei Y C . 2022 . MOTR： end-to-end multiple-object tracking with Transformer // Proceedings of the 17th European Conference Computer Vision . Tel Aviv， Israel ： Springer： 659 - 675 ［ DOI： 10.1007/978-3-031-19812-0_ 38 http://dx.doi.org/10.1007/978-3-031-19812-0_38 ］

Zhang S S ， Benenson R and Schiele B . 2017 . CityPersons： a diverse dataset for pedestrian detection // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu， USA ： IEEE： 4457 - 4465 ［ DOI： 10.1109/CVPR.2017.474 http://dx.doi.org/10.1109/CVPR.2017.474 ］

Zhang Y F ， Sun P Z ， Jiang Y ， Yu D D ， Weng F C ， Yuan Z H ， Luo P ， Liu W Y and Wang X G . 2022 . ByteTrack： multi-object tracking by associating every detection box // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 1 - 21 ［ DOI： 10.1007/978-3-031-20047-2_1 http://dx.doi.org/10.1007/978-3-031-20047-2_1 ］

Zhang Y F ， Wang C Y ， Wang X G ， Zeng W J and Liu W Y . 2021 . FairMOT： on the fairness of detection and re-identification in multiple object tracking . International Journal of Computer Vision ， 129 （ 11 ）： 3069 - 3087 ［ DOI： 10.1007/s11263-021-01513-4 http://dx.doi.org/10.1007/s11263-021-01513-4 ］

Zheng L ， Zhang H H ， Sun S Y ， Chandraker M ， Yang Y and Tian Q . 2017 . Person re-identification in the wild // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . Honolulu， USA ： IEEE： 3346 - 3355 ［ DOI： 10.1109/CVP R.2017.357 http://dx.doi.org/10.1109/CVPR.2017.357 ］

Zheng L Y ， Tang M ， Chen Y Y ， Zhu G B ， Wang J Q and Lu H Q . 2021 . Improving multiple object tracking with single object tracking // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . Nashville， USA ： IEEE： 2453-24 62 ［ DOI： 10.1109/CVPR46437.2021.00248 http://dx.doi.org/10.1109/CVPR46437.2021.00248 ］

Zheng Z H ， Wang P ， Liu W ， Li J Z ， Ye R G and Ren D W . 2020 . Distance-IoU Loss： faster and better learning for bounding box regression // Proceedings of the 34th AAAI Conference on Artificial Intelligence . New York， USA ： AAAI： 12993 - 13000 ［ DOI： 10.1609/aaai.v34i07.6999 http://dx.doi.org/10.1609/aaai.v34i07.6999 ］

Zhou X Y ， Koltun V and Krähenbühl P . 2020 . Tracking objects as points // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 474 - 490 ［ DOI： 10.1007/978-3-030-58548-8_28 http://dx.doi.org/10.1007/978-3-030-58548-8_28 ］

文章被引用时，请邮件提醒。

提交

图像与点云多重信息感知关联的三维多目标跟踪

低视点下遮挡自适应感知的多目标跟踪算法

全局多极团的分层关联多目标跟踪