互补特征交互融合的RGB_D实时显著目标检测

叶欣悦; 朱磊; 王文武; 付云

doi:10.11834/jig.230583

图像/视频语义分割 | 浏览量 : 0 下载量: 10 CSCD: 0

PDF
导出
分享
收藏
专辑

互补特征交互融合的RGB_D实时显著目标检测
RGB_D salient object detection algorithm based on complementary information interaction
2024年29卷第5期页码：1252-1264
纸质出版日期： 2024-05-16 ，
DOI： 10.11834/jig.230583
稿件说明：

移动端阅览

叶欣悦，朱磊，王文武，付云. 2024. 互补特征交互融合的RGB_D实时显著目标检测. 中国图象图形学报， 29(05):1252-1264

Ye Xinyue， Zhu Lei， Wang Wenwu， Fu Yun. 2024. RGB_D salient object detection algorithm based on complementary information interaction. Journal of Image and Graphics， 29(05):1252-1264
叶欣悦，朱磊，王文武，付云. 2024. 互补特征交互融合的RGB_D实时显著目标检测. 中国图象图形学报， 29(05):1252-1264 DOI： 10.11834/jig.230583.

Ye Xinyue， Zhu Lei， Wang Wenwu， Fu Yun. 2024. RGB_D salient object detection algorithm based on complementary information interaction. Journal of Image and Graphics， 29(05):1252-1264 DOI： 10.11834/jig.230583.

摘要

目的

通过融合颜色、深度和空间信息，利用RGB_D这两种模态数据的显著目标检测方案通常能比单一模态数据取得更加准确的预测结果。深度学习进一步推动RGB_D显著目标检测领域的发展。然而，现有RGB_D显著目标检测深度网络模型容易忽略模态的特异性，通常仅通过简单的元素相加、相乘或特征串联来融合多模态特征，如何实现RGB图像和深度图像之间的信息交互则缺乏合理性解释。为了探求两种模态数据中的互补信息重要性及更有效的交互方式，在分析了传统卷积网络中修正线性单元（rectified linear unit，ReLU）选通特性的基础上，设计了一种新的RGB和深度特征互补信息交互机制，并首次应用于RGB_D显著目标检测中。

方法

首先，根据该机制提出了互补信息交互模块将模态各自的“冗余”特征用于辅助对方。然后，将其阶段式插入两个轻量级主干网络分别用于提取RGB和深度特征并实施两者的交互。该模块核心功能基于修改的ReLU，具有结构简单的特点。在网络的顶层还设计了跨模态特征融合模块用于提取融合后特征的全局语义信息。该特征被馈送至主干网络每个尺度，并通过邻域尺度特征增强模块与多个尺度特征进行聚合。最后，采用了深度恢复监督、边缘监督和深度监督3种监督策略以有效监督提出模型的优化过程。

结果

在4个广泛使用的公开数据集NJU2K（Nanjing University 2K）、NLPR（national laboratory of pattern recognition）、STERE（stereo dataset）和SIP（salient person）上的定量和定性的实验结果表明，以Max F-measure、MAE（mean absolute error）以及Max E-measure 共3种主流测度评估，本文提出的显著目标检测模型相比较其他方法取得了更优秀的性能和显著的推理速度优势（373.8 帧/s）。

结论

本文论证了在RGB_D显著目标检测中两种模态数据具有信息互补特点，提出的模型具有较好的性能和高效率推理能力，有较好的实际应用价值。

Abstract

Objective

By fusing color， depth， and spatial information， using RGB_D data in salient object detection typically achieves more accurate predictions compared with using a single modality. Additionally， the rise of deep learning technology has further propelled the development of RGB_D salient object detection. However， existing RGB_D deep network models for salient object detection often overlook the specificity of different modalities. They typically rely on simple fusion methods， such as element-wise addition， multiplication， or feature concatenation， to combine multimodal features. However， the existing models of significant object detection in RGB_D deep networks often ignore the specificity of different modes. They often rely on simple fusion methods， such as element addition， multiplication， or feature joining， to combine multimodal features. These simple fusion techniques lack a reasonable explanation for the interaction between RGB and depth images. These methods do not effectively take advantage of the complementary information between RGB and depth modes nor do they take advantage of the potential correlations between them. Therefore， more efficient methods must be proposed to facilitate the information interaction between RGB images and depth images so as to obtain more accurate significant object detection results. To solve this problem， the researchers simulated the relationship between RGB and depth by analyzing traditional neural networks and linear correction units （ReLU）（e.g.， structures， such as constructed recurrent neural networks or convolutional neural networks）. Finally， a new interactive mechanism of complementary information between RGB and depth features is designed and applied to RGB_D salient target detection for the first time. This method analyzes the correlations between RGB and depth features and uses these correlations to guide the fusion and interaction process. To explore the importance of complementary information in both modalities and more effective ways of interaction， we propose a new RGB and depth feature complementary information interaction mechanism based on analyzing the selectivity of ReLU in traditional convolutional networks. This mechanism is applied for the first time in RGB_D salient object detection.

Method

First， on the basis of this mechanism， a complementary information interaction module is proposed to use the “redundancy” characteristics of each mode to assist each other. Then， it is inserted into two lightweight backbone networks in phases to extract RGB and depth features and implement the interaction between them. The core function of the module is based on the modified ReLU， which has a simple structure. At the top layer of the network， a cross-modal feature fusion module is designed to extract the global semantic information of the fused features. These features are passed to each scale of the backbone network and aggregated with multiscale features via a neighborhood scale feature enhancement module. In this manner， not only local and scale sensing features can be captured but also global semantic information can be obtained， thus improving the accuracy and robustness of salient target detection. At the same time， three monitoring strategies are adopted to supervise the optimization of the model effectively. First， the accuracy of depth information is constrained by depth recovery supervision to ensure the reliability of depth features. Second， edge supervision is used to guide the model to capture the boundary information of important targets and improve the positioning accuracy. Finally， deep supervision is used to improve the performance of the model further by monitoring the consistency between the fused features and the real significance graph.

Result

By conducting quantitative and qualitative experiments on widely used public datasets （Nanjing University 2K（NJU2K）， national laboratory of pattern recognition（NLPR）， stereo dataset（STERE）， and salient person（SIP））， the salient object detection model in this study shows remarkable advantages on three main evaluation measures： Max F-measure， mean absolute error（MAE）， and Max E-measure. The model performed relatively well， especially on the SIP dataset， where it achieved the best results. In addition， the processing speed of the model remarkably improved to 373.8 frame/s， while the parameter decreased to 10.8 M. Compared with the other six methods， the proposed complementary information aggregation module remarkably improved in the effect of salient target detection. By using the complementary information of RGB and depth features and through the design of cross-modal feature fusion module， the model can better capture the global semantic information of important targets and improve the accuracy and robustness of detection.

Conclusion

The proposed salient object detection model in this study is based on the design of complementary information interaction module， lightweight backbone network， and cross-modal feature fusion module. The method maximizes the complementary information of RGB and depth features and achieves remarkable performance improvement through optimized network structure and monitoring strategy. Compared with other methods， this model shows better results in terms of accuracy， robustness， and computational efficiency. In RGB_D data， this work is of crucial to deepening the understanding of the importance of multimodal data fusion and promoting the research and application in the field of salient target detection.

关键词

显著目标检测（SOD）RGB_D深度卷积网络互补信息交互跨模态特征融合

Keywords

salient object detection（SOD）RGB_Ddeep convolutional networkcomplementary information interactioncross-modal feature fusion

references

Achanta R， Hemami S， Estrada F and Susstrunk S. 2009. Frequency-tuned salient region detection//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami， USA： IEEE， 2009： 1597-1604 ［ DOI： 10.1109/CVPR.2009.5206596http://dx.doi.org/10.1109/CVPR.2009.5206596］

Chen H and Li Y F. 2018. Progressively complementarity-aware fusion network for RGB-D salient object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 3051-3060 ［DOI： 10.1109/CVPR.2018.00322http://dx.doi.org/10.1109/CVPR.2018.00322］

Chen L C， Papandreou G， Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation ［EB/OL］. ［2023-08-23］. https://arxiv.org/pdf/1706.05587.pdfhttps://arxiv.org/pdf/1706.05587.pdf

Chen T Y， Hu X G， Xiao J， Zhang G F and Wang S J. 2022. CFIDNet： cascaded feature interaction decoder for RGB-D salient object detection. Neural Computing and Applications， 34（10）： 7547-7563 ［DOI： 10.1007/s00521-021-06845-3http://dx.doi.org/10.1007/s00521-021-06845-3］

Cong R M， Lin Q W， Zhang C， Li C Y， Cao X C， Huang Q M and Zhao Y. 2022. CIR-Net： cross-modality interaction and refinement for RGB-D salient object detection. IEEE Transactions on Image Processing， 31： 6800-6815 ［DOI： 10.1109/TIP.2022.3216198http://dx.doi.org/10.1109/TIP.2022.3216198］

Cong R M， Zhang C， Xu M， Liu H Y and Zhao Y. 2023. Research progress of RGB-D salient object detection in deep learning era. Journal of Software， 34（4）： 1711-1731

丛润民，张晨，徐迈，刘鸿羽，赵耀. 2023. 深度学习时代下的RGB-D显著性目标检测研究进展. 软件学报， 34（4）： 1711-1731 ［DOI： 10.13328/j.cnki.jos.006700http://dx.doi.org/10.13328/j.cnki.jos.006700］

Fan D P， Ji G P， Qin X B and Cheng M M. 2021. Cognitive vision inspired object segmentation metric and loss function. SCIENTIA SINICA Informationis， 51（9）： 1475-1489

范登平，季葛鹏，秦雪彬，程明明. 2021. 认知规律启发的物体分割评价标准及损失函数. 中国科学：信息科学）， 51（9）： 1475-1489［DOI： 10.1360/SSI-2020-0370http://dx.doi.org/10.1360/SSI-2020-0370］

Fan D P， Lin Z， Zhang Z， Lin Z， Zhu M L and Cheng M M. 2020. Rethinking RGB-D salient object detection： models， data sets， and large-scale benchmarks. IEEE Transactions on neural networks and learning systems， 32（5）： 2075-2089 ［DOI： 10.1109/TNNLS.2020.2996406http://dx.doi.org/10.1109/TNNLS.2020.2996406］

Fu K R， Fan D P， Ji G P and Zhao Q J. 2020. JL-DCF： joint learning and densely-cooperative fusion framework for RGB-D salient object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 3049-3059 ［DOI： 10.1109/CVPR42600.2020.00312http://dx.doi.org/10.1109/CVPR42600.2020.00312］

Fu J， Liu J， Tian H J， Li Y， Bao Y J， Fang Z W and Lu H Q. 2019. Dual attention network for scene segmentation//Proceedings of 2019 IEEE/CVF conference on computer vision and pattern recognition. Long Beach， USA： IEEE： 3146-3154 ［DOI： 10.1109/CVPR.2019.00326http://dx.doi.org/10.1109/CVPR.2019.00326］

Han J W， Chen H， Liu N， Yan C G and Li X L. 2018. CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics， 48（11）： 3171-3183 ［DOI： 10.1109/TCYB.2017.2761775http://dx.doi.org/10.1109/TCYB.2017.2761775］

He J， Fu K R. 2022. RGB-D salient object detection of using few-shot learning. Journal of Image and Graphics， 27（10）： 2860-2872

何静，傅可人. 2022. 小样本条件下的RGB-D显著性物体检测. 中国图象图形学报， 27（10）： 2860-2872［DOI： 10.11834/jig.211068http://dx.doi.org/10.11834/jig.211068］

He W and Pan C. 2022. The salient object detection based on attention-guided network. Journal of Image and Graphics， 27（4）： 1176-1190

何伟，潘晨. 2022. 注意力引导网络的显著性目标检测. 中国图象图形学报， 27（4）： 1176-1190［DOI： 10.11834/jig.200658http://dx.doi.org/10.11834/jig.200658］

Hu Q M and Guo X J. 2021. Trash or treasure？ An interactive dual-stream strategy for single image reflection separation//Proceedings of the 35th Conference on Neural Information Processing Systems. ［s.l.］： NeurIPS： 24683-24694

Itti L， Koch C and Niebur E. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence， 20（11）： 1254-1259 ［DOI： 10.1109/34.730558http://dx.doi.org/10.1109/34.730558］

Ji W， Li J J， Zhang M， Piao Y R and Lu H C. 2020. Accurate RGB-D salient object detection via collaborative learning//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 52-69 ［DOI： 10.1007/978-3-030-58523-5_4http://dx.doi.org/10.1007/978-3-030-58523-5_4］

Jiang T T， Liu Y， Ma X and Sun J L. 2021. Multi-path collaborative salient object detection based on RGB-T images. Journal of Image and Graphics， 26（10）： 2388-2399

蒋亭亭，刘昱，马欣，孙景林. 2021. 多支路协同的RGB-T图像显著性目标检测. 中国图象图形学报， 26（10）： 2388-2399［DOI： 10.11834/jig.200317http://dx.doi.org/10.11834/jig.200317］

Ju R， Ge L， Geng W J， Ren T W and Wu G S. 2014. Depth saliency based on anisotropic center-surround difference//Proceedings of 2014 IEEE International Conference on Image Processing. Paris， France： IEEE： 1115-1119 ［DOI： 10.1109/ICIP.2014.7025222http://dx.doi.org/10.1109/ICIP.2014.7025222］

Kingma D P and Ba J. 2016. Adam： a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego， USA： ICLR： 1-15

Li C Y， Cong R M， Piao Y R， Xu Q Q and Loy C C. 2020. RGB-D salient object detection with cross-modality modulation and selection//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 225-241 ［DOI： 10.1007/978-3-030-58598-3_14http://dx.doi.org/10.1007/978-3-030-58598-3_14］

Li J J， Ji W， Bi Q， Yan C， Zhang M， Piao Y R， Lu H C and Cheng L. 2021. Joint semantic mining for weakly supervised RGB-D salient object detection//Proceedings of the 35th Conference on Neural Information Processing Syste. ［s.l.］： NeurIPS： 11945-11959

Li L， Han J W， Liu N， Khan S， Cholakkal H， Anwer R M and Khan F S. 2024. Robust perception and precise segmentation for scribble-supervised RGB-D saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence， 46（1）： 479-496 ［DOI： 10.1109/TPAMI.2023.3324807http://dx.doi.org/10.1109/TPAMI.2023.3324807］

Liu C， Yang G， Wang S， Wang H X， Zhang Y H and Wang Y T. 2023. TANet： transformer-based asymmetric network for RGB-D salient object detection. IET Computer Vision， 17（4）： 415-430 ［DOI： 10.1049/cvi2.12177http://dx.doi.org/10.1049/cvi2.12177］

Luo H L， Yuan P and Tong K. 2021. Review of the methods for salient object detection based on deep learning. Acta Electronica Sinica， 49（7）： 1417-1427

罗会兰，袁璞，童康. 2021. 基于深度学习的显著性目标检测方法综述. 电子学报， 49（7）： 1417-1427［DOI： 10.12263/DZXB.20200651http://dx.doi.org/10.12263/DZXB.20200651］

Milletari F， Navab N and Ahmadi S A. 2016. V-Net： fully convolutional neural networks for volumetric medical image segmentation//Proceedings of the 4th International Conference on 3D Vision. Stanford， USA： IEEE： 565-571 ［DOI： 10.1109/3DV.2016.79http://dx.doi.org/10.1109/3DV.2016.79］

Niu Y Z， Geng Y J， Li X Q and Liu F. 2012. Leveraging stereopsis for saliency analysis//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence， USA： IEEE： 454-461［DOI： 10.1109/CVPR.2012.6247708http://dx.doi.org/10.1109/CVPR.2012.6247708］

Paszke A， Gross S， Massa F， Lerer A， Bradbury J， Chanan G， Killeen T， Lin Z M， Gimelshein N， Antiga L， Desmaison A， Köpf A， Yang E， DeVito Z， Raison M， Tejani A， Chilamkurthy S， Steiner B， Lu F， Bai J J and Chintala S. 2019. PyTorch： an imperative style， high-performance deep learning library//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver， Canada： NeurIPS： 1-12

Peng H W， Li B， Xiong W H， Hu W M and Ji R R. 2014. RGBD salient object detection： a benchmark and algorithms//Proceedings of the 13th European Conference on Computer Vision. Zurich， Switzerland： Springer： 92-109［DOI： 10.1007/978-3-319-10578-9_7http://dx.doi.org/10.1007/978-3-319-10578-9_7］

Piao Y R， Ji W， Li J J， Zhang M， and Lu H C. 2019. Depth-induced multi-scale recurrent attention network for saliency detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 7253-7262 ［DOI： 10.1109/ICCV.2019.00735http://dx.doi.org/10.1109/ICCV.2019.00735］

Piao Y R， Rong Z K， Zhang M， Ren W S and Lu H C. 2020. A2dele： adaptive and attentive depth distiller for efficient RGB-D salient object detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 9057-9066［DOI： 10.1109/CVPR42600.2020.00908http://dx.doi.org/10.1109/CVPR42600.2020.00908］

Qu L Q， He S F， Zhang J W， Tian J D， Tang Y D and Yang Q X. 2017. RGBD salient object detection via deep fusion. IEEE Transactions on Image Processing， 26（5）： 2274-2285 ［DOI： 10.1109/TIP.2017.2682981http://dx.doi.org/10.1109/TIP.2017.2682981］

Sandler M， Howard A， Zhu M L， Zhmoginov A and Chen L C. 2018. MobileNetV2： inverted residuals and linear bottlenecks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 4510-4520 ［DOI： 10.1109/CVPR.2018.00474http://dx.doi.org/10.1109/CVPR.2018.00474］

Sun H， Liu Y S and Lin Y H. 2023. Deep learning based salient object detection： a survey. Journal of Data Acquisition and Processing， 38（1）： 21-50

孙涵，刘译善，林昱涵. 2023. 基于深度学习的显著性目标检测综述. 数据采集与处理， 38（1）： 21-50［DOI： 10.16337/j.1004-9037.2023.01.002http://dx.doi.org/10.16337/j.1004-9037.2023.01.002］

Sun P， Zhang W H， Li S Y， Guo Y L， Song C L and Li X. 2022. Learnable depth-sensitive attention for deep RGB-D saliency detection with multi-modal fusion architecture search. International Journal of Computer Vision， 130（11）： 2822-2841 ［DOI： 10.1007/s11263-022-01646-0http://dx.doi.org/10.1007/s11263-022-01646-0］

Wang N N and Gong X J. 2019. Adaptive fusion for RGB-D salient object detection. IEEE Access， 7： 55277-55284 ［DOI： 10.1109/ACCESS.2019.2913107http://dx.doi.org/10.1109/ACCESS.2019.2913107］

Wang Z， Bovik A C， Sheikh H R and Simoncelli E P. 2004. Image quality assessment： from error visibility to structural similarity. IEEE Transactions on Image Processing， 13（4）： 600-612 ［DOI： 10.1109/TIP.2003.819861http://dx.doi.org/10.1109/TIP.2003.819861］

Zhang J， Fan D P， Dai Y C， Anwar S， Saleh F S， Zhang T and Barnes N. 2020. UC-Net： uncertainty inspired RGB-D saliency detection via conditional variational autoencoders//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 8579-8588 ［DOI： 10.1109/CVPR42600.2020.00861http://dx.doi.org/10.1109/CVPR42600.2020.00861］

Zhang M， Fei S X， Liu J， Xu S， Piao Y R and Lu H C. 2020. Asymmetric two-stream architecture for accurate RGB-D saliency detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 374-390［DOI： 10.1007/978-3-030-58604-1_23http://dx.doi.org/10.1007/978-3-030-58604-1_23］

Zhao J X， Cao Y， Fan D P， Cheng M M， Li X Y and Zhang L. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3922-3931 ［DOI： 10.1109/CVPR.2019.00405http://dx.doi.org/10.1109/CVPR.2019.00405］

Zhou T， Fan D P， Cheng M M， Shen J B and Shao L. 2021. RGB-D salient object detection： a survey. Computational Visual Media， 7（1）： 37-69 ［DOI： 10.1007/s41095-020-0199-zhttp://dx.doi.org/10.1007/s41095-020-0199-z］

文章被引用时，请邮件提醒。

提交

暂无数据