基于深度学习的弱监督语义分割方法综述
Weakly supervised semantic segmentation based on deep learning
- 2024年29卷第5期 页码:1146-1168
纸质出版日期: 2024-05-16
DOI: 10.11834/jig.230628
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-05-16 ,
移动端阅览
项伟康, 周全, 崔景程, 莫智懿, 吴晓富, 欧卫华, 王井东, 刘文予. 2024. 基于深度学习的弱监督语义分割方法综述. 中国图象图形学报, 29(05):1146-1168
Xiang Weikang, Zhou Quan, Cui Jingcheng, Mo Zhiyi, Wu Xiaofu, Ou Weihua, Wang Jingdong, Liu Wenyu. 2024. Weakly supervised semantic segmentation based on deep learning. Journal of Image and Graphics, 29(05):1146-1168
语义分割是计算机视觉领域的基本任务,旨在为每个像素分配语义类别标签,实现对图像的像素级理解。得益于深度学习的发展,基于深度学习的全监督语义分割方法取得了巨大进展。然而,这些方法往往需要大量带有像素级标注的训练数据,标注成本巨大,限制了其在诸如自动驾驶、医学图像分析以及工业控制等实际场景中的应用。为了降低数据的标注成本并进一步拓宽语义分割的应用场景,研究者们越来越关注基于深度学习的弱监督语义分割方法,希望通过诸如图像级标注、最小包围盒标注、线标注和点标注等弱标注信息实现图像的像素级分割预测。首先对语义分割任务进行了简要介绍,并分析了全监督语义分割所面临的困境,从而引出弱监督语义分割。然后,介绍了相关数据集和评估指标。接着,根据弱标注的类型和受关注程度,从图像级标注、其他弱标注以及大模型辅助这3个方面回顾和讨论了弱监督语义分割的研究进展。其中,第2类弱监督语义分割方法包括基于最小包围盒、线和点标注的弱监督语义分割。最后,分析了弱监督语义分割领域存在的问题与挑战,并就其未来可能的研究方向提出建议,旨在进一步推动弱监督语义分割领域研究的发展。
Semantic segmentation is an important and fundamental task in the field of computer vision. Its goal is to assign a semantic category label to each pixel in an image, achieving pixel-level understanding. It has wide applications in areas, such as autonomous driving, virtual reality, and medical image analysis. Given the development of deep learning in recent years, remarkable progress has been achieved in fully supervised semantic segmentation, which requires a large amount of training data with pixel-level annotations. However, accurate pixel-level annotations are difficult to provide because it sacrifices substantial time, money, and human-label resources, thus limiting their widespread application in reality. To reduce the cost of annotating data and further expand the application scenarios of semantic segmentation, researchers are paying increasing attention to weakly supervised semantic segmentation (WSSS) based on deep learning. The goal is to develop a semantic segmentation model that utilizes weak annotations information instead of dense pixel-level annotations to predict pixel-level segmentation accurately. Weak annotations mainly include image-level, bounding-box, scribble, and point annotations. The key problem in WSSS lies in how to find a way to utilize the limited annotation information, incorporate appropriate training strategies, and design powerful models to bridge the gap between weak supervision and pixel-level annotations. This study aims to classify and summarize WSSS methods based on deep learning, analyze the challenges and problems encountered by recent methods, and provide insights into future research directions. First, we introduce WSSS as a solution to the limitations of fully supervised semantic segmentation. Second, we introduce the related datasets and evaluation metrics. Third, we review and discuss the research progress of WSSS from three categories: image-level annotations, other weak annotations, and assistance from large-scale models, where the second category includes bounding-box, scribble, and point annotations. Specifically, image-level annotations only provide object categories information contained in the image, without specifying the positions of the target objects. Existing methods always follow a two-stage training process: producing a class activation map (CAM), also known as initial seed regions used to generate high-quality pixel-level pseudo labels; and training a fully supervised semantic segmentation model using the produced pixel-level pseudo labels. According to whether the pixel-level pseudo labels are updated or not during the training process in the second stage, WSSS based on image-level annotations can be further divided into offline and online approaches. For offline approaches, existing research treats two stages independently, where the initial seed regions are optimized to obtain more reliable pixel-level pseudo labels that remain unchanged throughout the second stage. They are often divided into six classes according to different optimization strategies, including the ensemble of CAM, image erasing, co-occurrence relationship decoupling, affinity propagation, additional supervised information, and self-supervised learning. For online approaches, the pixel-level pseudo labels keep updating during the entire training process in the second stage. The production of pixel-level pseudo labels and the semantic segmentation model are jointly optimized. The online counterparts can be trained end to end, making the training process more efficient. Compared with image-level annotations, other weak annotations, including bounding box, scribble, and point, are more powerful supervised signals. Among them, bounding-box annotations not only provide object category labels but also include information of object positions. The regions outside the bounding-box are always considered background, while box regions simultaneously contain foreground and background areas. Therefore, for bounding-box annotations, existing research mainly starts from accurately distinguishing foreground areas from background regions within the bounding-box, thereby producing more accurate pixel-level pseudo labels, used for training following semantic segmentation networks. Scribble and point annotations not only indicate the categories of objects contained in the image but also provide local positional information of the target objects. For scribble annotations, more complete pseudo labels can be produced to supervise semantic segmentation by inferring the category of unlabeled regions from the annotated scribble. For point annotations, the associated semantic information is expanded to the entire image through label propagation, distance metric learning, and loss function optimization. In addition, with the rapid development of large-scale models, this paper further discusses the recent research achievements in using large-scale models to assist WSSS tasks. Large-scale models can leverage their pretrained universal knowledge to understand images and generate accurate pixel-level pseudo labels, thus improving the final segmentation performance. This paper also reports the quantitative segmentation results on pattern analysis, statistical modeling and computational learning visual object classes 2012(PASCAL VOC 2012) dataset to evaluate the performance of different WSSS methods. Finally, four challenges and potential future research directions are provided. First, a certain performance gap remains between weakly supervised and fully supervised methods. To bridge this gap, research should keep on improving the accuracy of pixel-level pseudo labels. Second, when WSSS models are applied to real-world scenarios, they may encounter object categories that have never appeared in the training data. This encounter requires the models to have a certain adaptability to identify and segment unknown objects. Third, existing research mainly focuses on improving the accuracy without considering the model size and inference speed of WSSS networks, posing a major challenge for the deployment of the model in real-world applications that require real-time estimations and online decisions. Fourth, the scarcity of relevant datasets used to evaluate different WSSS models and algorithms is also a major obstacle, which leads to performance degradation and limits generalization capability. Therefore, large-scale WSSS datasets with high quality, great diversity, and wide variation of image types must be constructed.
语义分割深度学习弱监督语义分割(WSSS)图像级标注最小包围盒标注线标注点标注大模型
semantic segmentationdeep learningweakly supervised semantic segmentation (WSSS)image-level annotationbounding-box annotationscribble annotationpoint annotationlarge-scale model
Ahn J, Cho S and Kwak S. 2019. Weakly supervised learning of instance segmentation with inter-pixel relations//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2204-2213 [DOI: 10.1109/cvpr.2019.00231http://dx.doi.org/10.1109/cvpr.2019.00231]
Ahn J and Kwak S. 2018. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4981-4990 [DOI: 10.1109/cvpr.2018.00523http://dx.doi.org/10.1109/cvpr.2018.00523]
Araslanov N and Roth S. 2020. Single-stage semantic segmentation from image labels//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4252-4261 [DOI: 10.1109/cvpr42600.2020.00431http://dx.doi.org/10.1109/cvpr42600.2020.00431]
Arbelez P, Pont-Tuset J, Barron J, Marques F and Malik J. 2014. Multiscale combinatorial grouping//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 328-335 [DOI: 10.1109/cvpr.2014.49http://dx.doi.org/10.1109/cvpr.2014.49]
Baltrušaitis T, Ahuja C and Morency L P. 2019. Multimodal machine learning: a survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2): 423-443 [DOI: 10.1109/TPAMI.2018.2798607http://dx.doi.org/10.1109/TPAMI.2018.2798607]
Bearman A, Russakovsky O, Ferrari V and Li F F. 2016. What’s the point: semantic segmentation with point supervision//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 549-565 [DOI: 10.1007/978-3-319-46478-7_34http://dx.doi.org/10.1007/978-3-319-46478-7_34]
Chang Y T, Wang Q S, Hung W C, Piramuthu R, Tsai Y H and Yang M H. 2020. Weakly-supervised semantic segmentation via sub-category exploration//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8988-8997 [DOI: 10.1109/cvpr42600.2020.00901http://dx.doi.org/10.1109/cvpr42600.2020.00901]
Chen L Y, Wu W W, Fu C C, Han X and Zhang Y T. 2020. Weakly supervised semantic segmentation with boundary exploration//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 347-362 [DOI: 10.1007/978-3-030-58574-7_21http://dx.doi.org/10.1007/978-3-030-58574-7_21]
Chen Q, Yang L X, Lai J H and Xie X H. 2022a. Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4278-4288 [DOI: 10.1109/cvpr52688.2022.00425http://dx.doi.org/10.1109/cvpr52688.2022.00425]
Chen T, Yao Y Z and Tang J H. 2023b. Multi-granularity denoising and bidirectional alignment for weakly supervised semantic segmentation. IEEE Transactions on Image Processing, 32: 2960-2971 [DOI: 10.1109/TIP.2023.3275913http://dx.doi.org/10.1109/TIP.2023.3275913]
Chen T, Yao Y Z, Zhang L, Wang Q, Xie G S and Shen F M. 2023c. Saliency guided inter- and intra-class relation constraints for weakly supervised semantic segmentation. IEEE Transactions on Multimedia, 25: 1727-1737 [DOI: 10.1109/tmm.2022.3157481http://dx.doi.org/10.1109/tmm.2022.3157481]
Chen T L, Mai Z D, Li R W and Chao W L. 2023a. Segment anything model (SAM) enhanced pseudo labels for weakly supervised semantic segmentation [EB/OL]. [2023-08-28]. https://arxiv.org/pdf/ 2305.05803.pdfhttps://arxiv.org/pdf/2305.05803.pdf
Chen Z, Tian Z Q, Zhu J H, Li C and Du S Y. 2022b. C-CAM: causal CAM for weakly supervised semantic segmentation on medical image//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 11666-11675 [DOI: 10.1109/cvpr52688.2022.01138http://dx.doi.org/10.1109/cvpr52688.2022.01138]
Chen Z Z and Sun Q R. 2023. Extracting class activation maps from non-discriminative features as well//Proceedings of 2023 IEEE/ CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 3135-3144 [DOI: 10.1109/CVPR52729.2023.00306http://dx.doi.org/10.1109/CVPR52729.2023.00306]
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S and Schiele B. 2016. The cityscapes dataset for semantic urban scene understanding//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3213-3223 [DOI: 10.1109/cvpr.2016.350http://dx.doi.org/10.1109/cvpr.2016.350]
Dai J F, He K M and Sun J. 2015. BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1635-1643 [DOI: 10.1109/iccv.2015.191http://dx.doi.org/10.1109/iccv.2015.191]
Dalal N and Triggs B. 2005. Histograms of oriented gradients for human detection//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 886-893 [DOI: 10.1109/CVPR.2005.177http://dx.doi.org/10.1109/CVPR.2005.177]
Dempster A P, Laird N M and Rubin D B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1): 1-22 [DOI: 10.1111/j.2517-6161.1977.tb01600.xhttp://dx.doi.org/10.1111/j.2517-6161.1977.tb01600.x]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2021. An image is worth 16 × 16 words: Transformers for image recognition at scale//Proceedings of the 9th International Conference on Learning Representations. [s.l.]: OpenReview.net
Du Y, Fu Z H, Liu Q J and Wang Y H. 2022. Weakly supervised semantic segmentation by pixel-to-prototype contrast//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4310-4319 [DOI: 10.1109/CVPR52688.2022.00428http://dx.doi.org/10.1109/CVPR52688.2022.00428]
Everingham M, Eslami S M A, Van Gool L, Williams C K I, Winn J and Zisserman A. 2015. The Pascal visual object classes challenge: a retrospective. International Journal of Computer Vision, 111(1): 98-136 [DOI: 10.1007/s11263-014-0733-5http://dx.doi.org/10.1007/s11263-014-0733-5]
Gao S H, Li Z Y, Yang M H, Cheng M M, Han J W and Torr P. 2023. Large-scale unsupervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6): 7457-7476 [DOI: 10.1109/TPAMI.2022.3218275http://dx.doi.org/10.1109/TPAMI.2022.3218275]
Grill J B, Strub F, Altché F, Tallec C, Richemond P H, Buchatskaya E, Doersch C, Pires B A, Guo Z D, Azar M G, Piot B, Kavukcuoglu K, Munos R and Valko M. 2020. Bootstrap your own latent a new approach to self-supervised learning//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc: #1786
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F and Pedreschi D. 2019. A survey of methods for explaining black box models. ACM Computing Surveys, 51(5): 1-42 [DOI: 10.1145/3236009http://dx.doi.org/10.1145/3236009]
He K M, Fan H Q, Wu Y X, Xie S N and Girshick R. 2020. Momentum contrast for unsupervised visual representation learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9726-9735 [DOI: 10.1109/cvpr42600.2020.00975http://dx.doi.org/10.1109/cvpr42600.2020.00975]
Hinton G E and Salakhutdinov R R. 2006. Reducing the dimensionality of data with neural networks. Science, 313(5786): 504-507 [DOI: 10.1126/science.1127647http://dx.doi.org/10.1126/science.1127647]
Hou Q B, Jiang P T, Wei Y C and Chen M M. 2018. Self-erasing network for integral object attention//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc: 547-557
Huang Z L, Wang X G, Wang J S, Liu W Y and Wang J D. 2018. Weakly-supervised semantic segmentation network with deep seeded region growing//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7014-7023 [DOI: 10.1109/CVPR.2018.00733http://dx.doi.org/10.1109/CVPR.2018.00733]
Jiang P T, Han L H, Hou Q B, Cheng M M and Wei Y C. 2022a. Online attention accumulation for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10): 7062-7077 [DOI: 10.1109/tpami.2021.3092573http://dx.doi.org/10.1109/tpami.2021.3092573]
Jiang P T, Hou Q B, Cao Y, Cheng M M, Wei Y C and Xiong H K. 2019. Integral object mining via online attention accumulation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 2070-2079 [DOI: 10.1109/iccv.2019.00216http://dx.doi.org/10.1109/iccv.2019.00216]
Jiang P T, Yang Y Q, Hou Q B and Wei Y C. 2022b. L2G: a simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 16865-16875 [DOI: 10.1109/cvpr52688.2022.01638http://dx.doi.org/10.1109/cvpr52688.2022.01638]
Jing L L and Tian Y L. 2021. Self-supervised visual feature learning with deep neural networks: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11): 4037-4058 [DOI: 10.1109/TPAMI.2020.2992393http://dx.doi.org/10.1109/TPAMI.2020.2992393]
Jo S and Yu I J. 2021. Puzzle-CAM: improved localization via matching partial and full features//Proceedings of 2021 IEEE International Conference on Image Processing. Anchorage, USA: IEEE: 639-643 [DOI: 10.1109/icip42928.2021.9506058http://dx.doi.org/10.1109/icip42928.2021.9506058]
Joon Oh S, Benenson R, Khoreva A, Akata Z, Fritz M and Schiele B. 2017. Exploiting saliency for object segmentation from image level labels//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5038-5047 [DOI: 10.1109/cvpr.2017.535http://dx.doi.org/10.1109/cvpr.2017.535]
Ke T W, Hwang J J and Yu S X. 2021. Universal weakly supervised segmentation by pixel-to-segment contrastive learning//Proceedings of the 9th International Conference on Learning Representations. [s.l.]: OpenReview.net
Khoreva A, Benenson R, Hosang J, Hein M and Schiele B. 2017. Simple does it: weakly supervised instance and semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1665-1674 [DOI: 10.1109/cvpr.2017.181http://dx.doi.org/10.1109/cvpr.2017.181]
Kim D, Cho D, Yoo D and Kweon I S. 2017. Two-phase learning for weakly supervised object localization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 3554-3564 [DOI: 10.1109/iccv.2017.382http://dx.doi.org/10.1109/iccv.2017.382]
Kirillov A, Mintun E, Ravi N, Mao H Z, Rolland C, Gustafson L, Xiao T T, Whitehead S, Berg A C, Lo W Y, Dollr P and Girshick R. 2023. Segment anything [EB/OL]. [2023-08-28]. https://arxiv.org/pdf/2304.02643.pdfhttps://arxiv.org/pdf/2304.02643.pdf
Kolesnikov A and Lampert C H. 2016. Seed, expand and constrain: three principles for weakly-supervised image segmentation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 695-711 [DOI: 10.1007/978-3-319-46493-0_42http://dx.doi.org/10.1007/978-3-319-46493-0_42]
Krähenbühl P and Koltun V. 2011. Efficient inference in fully connected CRFs with Gaussian edge potentials//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain: Curran Associates Inc: 109-117
Kulharia V, Chandra S, Agrawal A, Torr P and Tyagi A. 2020. Box2Seg: attention weighted loss and discriminative feature learning for weakly supervised segmentation//Proceedings of the 16th European Conference on Computer Vision. Online: Springer: 290-308 [DOI: 10.1007/978-3-030-58583-9_18http://dx.doi.org/10.1007/978-3-030-58583-9_18]
Kweon H, Yoon S H, Kim H, Park D and Yoon K J. 2021. Unlocking the potential of ordinary classifier: class-specific adversarial erasing framework for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 6974-6983 [DOI: 10.1109/iccv48922.2021.00691http://dx.doi.org/10.1109/iccv48922.2021.00691]
Kweon H, Yoon S H and Yoon K J. 2023. Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 11329-11339 [DOI: 10.1109/CVPR52729.2023.01090http://dx.doi.org/10.1109/CVPR52729.2023.01090]
Lafferty J D, McCallum A and Pereira F C N. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data//Proceedings of the 18th International Conference on Machine Learning. Williams College, USA: Morgan Kaufmann Publishers Inc
Lee J, Kim E, Lee S, Lee J and Yoon S. 2019. FickleNet: weakly and semi-supervised semantic image segmentation using stochastic inference//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Angeles, USA: IEEE: 5262-5271 [DOI: 10.1109/CVPR.2019.00541http://dx.doi.org/10.1109/CVPR.2019.00541]
Lee J, Kim E and Yoon S. 2021a. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 4070-4078 [DOI: 10.1109/cvpr46437.2021.00406http://dx.doi.org/10.1109/cvpr46437.2021.00406]
Lee J, Yi J, Shin C and Yoon S. 2021b. BBAM: bounding box attribution map for weakly supervised semantic and instance segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 2643-2651 [DOI: 10.1109/cvpr46437.2021.00267http://dx.doi.org/10.1109/cvpr46437.2021.00267]
Lee S, Lee M, Lee J and Shim H. 2021c. Railroad is not a train: saliency as pseudo-pixel supervision for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 5491-5501 [DOI: 10.1109/cvpr46437.2021.00545http://dx.doi.org/10.1109/cvpr46437.2021.00545]
Li J, Fan J S and Zhang Z X. 2022a. Towards noiseless object contours for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 16835-16844 [DOI: 10.1109/cvpr52688.2022.01635http://dx.doi.org/10.1109/cvpr52688.2022.01635]
Li X Y, Zhou T F, Li J W, Zhou Y and Zhang Z X. 2021. Group-wise semantic mining for weakly supervised semantic segmentation//Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s.l.]: AAAI: 1984-1992 [DOI: 10.1609/aaai.v35i3.16294http://dx.doi.org/10.1609/aaai.v35i3.16294]
Li Y W, Zhao H S, Qi X J, Chen Y K, Qi L, Wang L W, Li Z M, Sun J and Jia J Y. 2022b. Fully convolutional networks for panoptic segmentation with point-based supervision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4): 4552-4568 [DOI: 10.1109/tpami.2022.3200416http://dx.doi.org/10.1109/tpami.2022.3200416]
Lin D, Dai J F, Jia J Y, He K M and Sun J. 2016. ScribbleSup: scribble-supervised convolutional networks for semantic segmentation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3159-3167 [DOI: 10.1109/cvpr.2016.344http://dx.doi.org/10.1109/cvpr.2016.344]
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollr P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755 [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]
Lin Y Q, Chen M H, Wang W X, Wu B X, Li K, Lin B B, Liu H F and He X F. 2023. CLIP is also an efficient segmenter: a text-driven approach for weakly supervised semantic segmentation//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 15305-15314 [DOI: 10.1109/CVPR52729.2023.01469http://dx.doi.org/10.1109/CVPR52729.2023.01469]
Liu S L, Zeng Z Y, Ren T H, Li F, Zhang H, Yang J, Li C Y, Yang J W, Su H, Zhu J and Zhang L. 2023. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection [EB/OL]. [2023-08-28]. https://arxiv.org/pdf/2303.05499.pdfhttps://arxiv.org/pdf/2303.05499.pdf
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440 [DOI: 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]
Lowe D G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91-110 [DOI: 10.1023/B:VISI.0000029664.99615.94http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94]
MacQueen J. 1967. Some methods for classification and analysis of multivariate observations//The 5th Berkeley Symposium on Mathematical Statistics and Probability. Oakland, USA: Unversity of California Press: 281-297
Maninis K K, Caelles S, Pont-Tuset J and van Gool L. 2018. Deep extreme cut: from extreme points to object segmentation//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 616-625 [DOI: 10.1109/cvpr.2018.00071http://dx.doi.org/10.1109/cvpr.2018.00071]
McEver R A and Manjunath B S. 2020. PCAMs: weakly supervised semantic segmentation using point supervision [EB/OL]. [2023-08-28]. https://arxiv.org/pdf/2007.05615.pdfhttps://arxiv.org/pdf/2007.05615.pdf
Minaee S, Boykov Y Y, Porikli F, Plaza A J, Kehtarnavaz N and Terzopoulos D. 2022. Image segmentation using deep learning: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7): 3523-3542 [DOI: 10.1109/TPAMI.2021.3059968http://dx.doi.org/10.1109/TPAMI.2021.3059968]
Mottaghi R, Chen X J, Liu X B, Cho N G, Lee S W, Fidler S, Urtasun R and Yuille A. 2014. The role of context for object detection and semantic segmentation in the wild//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 891-898 [DOI: 10.1109/cvpr.2014.119http://dx.doi.org/10.1109/cvpr.2014.119]
Neuhold G, Ollmann T, Rota Bulo S and Kontschieder P. 2017. The Mapillary vistas dataset for semantic understanding of street scenes//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4990-4999 [DOI: 10.1109/iccv.2017.534http://dx.doi.org/10.1109/iccv.2017.534]
Oh Y, Kim B and Ham B. 2021. Background-aware pooling and noise-aware loss for weakly-supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 6909-6918 [DOI: 10.1109/cvpr46437.2021.00684http://dx.doi.org/10.1109/cvpr46437.2021.00684]
Ojala T, Pietikainen M and Harwood D. 1994. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions//Proceedings of the 12th International Conference on Pattern Recognition. Jerusalem, Israel: IEEE: 582-585 [DOI: 10.1109/ICPR.1994.576366http://dx.doi.org/10.1109/ICPR.1994.576366]
Pan S J and Yang Q. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10): 1345-1359 [DOI: 10.1109/TKDE.2009.191http://dx.doi.org/10.1109/TKDE.2009.191]
Papadopoulos D P, Uijlings J R R, Keller F and Ferrari V. 2017. Extreme clicking for efficient object annotation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4940-4949 [DOI: 10.1109/iccv.2017.528http://dx.doi.org/10.1109/iccv.2017.528]
Papandreou G, Chen L C, Murphy K P and Yuille A L. 2015. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 1742-1750 [DOI: 10.1109/iccv.2015.203http://dx.doi.org/10.1109/iccv.2015.203]
Peng Z L, Wang G C, Xie L X, Jiang D S, Shen W and Tian Q. 2023. USAGE: a unified seed area generation paradigm for weakly supervised semantic segmentation//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE [DOI: 10.1109/ICCV51070.2023.00064http://dx.doi.org/10.1109/ICCV51070.2023.00064]
Qian R, Wei Y C, Shi H H, Li J C, Liu J Y and Huang T. 2019. Weakly supervised scene parsing with point-based distance metric learning//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI: 8843-8850 [DOI: 10.1609/aaai.v33i01.33018843http://dx.doi.org/10.1609/aaai.v33i01.33018843]
Qing C, Yu J, Xiao C B and Duan J. 2020. Deep convolutional neural network for semantic image segmentation. Journal of Image and Graphics, 25(6): 1069-1090
青晨, 禹晶, 肖创柏, 段娟. 2020. 深度卷积神经网络图像语义分割研究进展. 中国图象图形学报, 25(6): 1069-1090[DOI: 10.11834/jig.190355http://dx.doi.org/10.11834/jig.190355]
Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G and Sutskever I. 2021. Learning transferable visual models from natural language supervision//Proceedings of the 38th International Conference on Machine Learning. [s.l.]: ACM: 8748-8763
Ren D W, Wang Q L, Wei Y C, Meng D Y and Zuo W M. 2022. Progress in weakly supervised learning for visual understanding. Journal of Image and Graphics, 27(6): 1768-1798
任冬伟, 王旗龙, 魏云超, 孟德宇, 左旺孟. 2022. 视觉弱监督学习研究进展. 中国图象图形学报, 27(6): 1768-1798[DOI: 10.11834/jig.220178http://dx.doi.org/10.11834/jig.220178]
Rong S H, Tu B H, Wang Z L and Li J J. 2023. Boundary-enhanced Co-training for weakly supervised semantic segmentation//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 19574-19584 [DOI: 10.1109/CVPR52729.2023.01875http://dx.doi.org/10.1109/CVPR52729.2023.01875]
Rother C, Kolmogorov V and Blake A. 2004. “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3): 309-314 [DOI: 10.1145/1015706.1015720http://dx.doi.org/10.1145/1015706.1015720]
Ru L X, Zhan Y B, Yu B S and Du B. 2022. Learning affinity from attention: end-to-end weakly-supervised semantic segmentation with Transformers//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 16825-16834 [DOI: 10.1109/CVPR52688.2022.01634http://dx.doi.org/10.1109/CVPR52688.2022.01634]
Ru L X, Zheng H L, Zhan Y B and Du B. 2023. Token contrast for weakly-supervised semantic segmentation//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 3093-3102 [DOI: 10.1109/CVPR52729.2023.00302http://dx.doi.org/10.1109/CVPR52729.2023.00302]
Scarselli F, Gori M, Tsoi A C, Hagenbuchner M and Monfardini G. 2009. The graph neural network model. IEEE Transactions on Neural Networks, 20(1): 61-80 [DOI: 10.1109/TNN.2008.2005605http://dx.doi.org/10.1109/TNN.2008.2005605]
Shen W, Peng Z L, Wang X H, Wang H Y, Cen J Z, Jiang D S, Xie L X, Yang X K and Tian Q. 2023. A survey on label-efficient deep image segmentation: bridging the gap between weak supervision and dense prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8): 9284-9305 [DOI: 10.1109/TPAMI.2023.3246102http://dx.doi.org/10.1109/TPAMI.2023.3246102]
Song C F, Huang Y, Ouyang W L and Wang L. 2019. Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Angeles, USA: IEEE: 3136-3145 [DOI: 10.1109/cvpr.2019.00325http://dx.doi.org/10.1109/cvpr.2019.00325]
Su Y K, Sun R Z, Lin G S and Wu Q Y. 2021. Context decoupling augmentation for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 6984-6994 [DOI: 10.1109/iccv48922.2021.00692http://dx.doi.org/10.1109/iccv48922.2021.00692]
Sun G L, Wang W G, Dai J F and van Gool L. 2020. Mining cross-image semantics for weakly supervised semantic segmentation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 347-365 [DOI: 10.1007/978-3-030-58536-5_21http://dx.doi.org/10.1007/978-3-030-58536-5_21]
Sun K Y, Shi H Q, Zhang Z M and Huang Y M. 2021. ECS-Net: improving weakly supervised semantic segmentation by using connections between class activation maps//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 7263-7272 [DOI: 10.1109/iccv48922.2021.00719http://dx.doi.org/10.1109/iccv48922.2021.00719]
Sun W X, Liu Z Y, Zhang Y H, Zhong Y R and Barnes N. 2023. An alternative to WSSS? An empirical study of the segment anything model (SAM) on weakly-supervised semantic segmentation problems [EB/OL]. [2023-08-28]. https://arxiv.org/pdf/2305.01586.pdfhttps://arxiv.org/pdf/2305.01586.pdf
Tang M, Djelouah A, Perazzi F, Boykov Y and Schroers C. 2018a. Normalized cut loss for weakly-supervised CNN segmentation//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1818-1827 [DOI: 10.1109/cvpr.2018.00195http://dx.doi.org/10.1109/cvpr.2018.00195]
Tang M, Perazzi F, Djelouah A, Ayed I B, Schroers C and Boykov Y. 2018b. On regularized losses for weakly-supervised CNN segmentation//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 524-540 [DOI: 10.1007/978-3-030-01270-0_31http://dx.doi.org/10.1007/978-3-030-01270-0_31]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin L. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc: 6000-6010
Vernaza P and Chandraker M. 2017. Learning random-walk label propagation for weakly-supervised semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2953-2961 [DOI: 10.1109/cvpr.2017.315http://dx.doi.org/10.1109/cvpr.2017.315]
Wang B, Qi G J, Tang S, Zhang T Z, Wei Y C, Li L H and Zhang Y D. 2019. Boundary perception guidance: a scribble-supervised semantic segmentation approach//Proceedings of the 28th IJCAI International Joint Conference on Artificial Intelligence. Macao, China: Morgan Kaufmann: 3663-3669 [DOI: 10.24963/ijcai.2019/508http://dx.doi.org/10.24963/ijcai.2019/508]
Wang Y D, Zhang J, Kan M N, Shan S G and Chen X L. 2020. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12272-12281 [DOI: 10.1109/cvpr42600.2020.01229http://dx.doi.org/10.1109/cvpr42600.2020.01229]
Wei Y C, Feng J S, Liang X D, Cheng M M, Zhao Y and Yan S C. 2017. Object region mining with adversarial erasing: a simple classification to semantic segmentation approach//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6488-6496 [DOI: 10.1109/cvpr.2017.687http://dx.doi.org/10.1109/cvpr.2017.687]
Wei Y C, Xiao H X, Shi H H, Jie Z Q, Feng J S and Huang T S. 2018. Revisiting dilated convolution: a simple approach for weakly-and semi-supervised semantic segmentation//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7268-7277 [DOI: 10.1109/CVPR.2018.00759http://dx.doi.org/10.1109/CVPR.2018.00759]
Weinberger K Q and Saul L K. 2009. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research, 10: 207-244
Wu T, Huang J S, Gao G Y, Wei X M, Wei X L, Luo X and Liu C H. 2021. Embedded discriminative attention mechanism for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 16760-16769 [DOI: 10.1109/cvpr46437.2021.01649http://dx.doi.org/10.1109/cvpr46437.2021.01649]
Xian Y Q, Lampert C H, Schiele B and Akata Z. 2019. Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9): 2251-2265 [DOI: 10.1109/TPAMI.2018.2857768http://dx.doi.org/10.1109/TPAMI.2018.2857768]
Xie J H, Hou X X, Ye K and Shen L L. 2022a. CLIMS: cross language image matching for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4473-4482 [DOI: 10.1109/cvpr52688.2022.00444http://dx.doi.org/10.1109/cvpr52688.2022.00444]
Xie J H, Xiang J F, Chen J L, Hou X X, Zhao X D and Shen L L. 2022b. C2AM: contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 989-998 [DOI: 10.1109/cvpr52688.2022.00106http://dx.doi.org/10.1109/cvpr52688.2022.00106]
Xu J S, Zhou C W, Cui Z, Xu C Y, Huang Y G, Shen P C, Li S X and Yang J. 2021. Scribble-supervised semantic segmentation inference//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 15334-15343 [DOI: 10.1109/iccv48922.2021.01507http://dx.doi.org/10.1109/iccv48922.2021.01507]
Xu L, Ouyang W L, Bennamoun M, Boussaid F and Xu D. 2022. Multi-class token Transformer for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4300-4309 [DOI: 10.1109/cvpr52688.2022.00427http://dx.doi.org/10.1109/cvpr52688.2022.00427]
Xu L, Ouyang W L, Bennamoun M, Boussaid F and Xu D. 2023. Learning multi-modal class-specific tokens for weakly supervised dense object localization//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 19596-19605 [DOI: 10.1109/CVPR52729.2023.01877http://dx.doi.org/10.1109/CVPR52729.2023.01877]
Yu Z, Zhuge Y Z, Lu H C and Zhang L H. 2019. Joint learning of saliency detection and weakly supervised semantic segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 7223-7233 [DOI: 10.1109/ICCV.2019.00732http://dx.doi.org/10.1109/ICCV.2019.00732]
Zhang B F, Xiao J M, Wei Y C, Sun M J and Huang K Z. 2020a. Reliability does matter: an end-to-end weakly supervised semantic segmentation approach//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 12765-12772 [DOI: 10.1609/aaai.v34i07.6971http://dx.doi.org/10.1609/aaai.v34i07.6971]
Zhang B F, Xiao J M and Zhao Y. 2021a. Dynamic feature regularized loss for weakly supervised semantic segmentation [EB/OL]. [2023-08-28]. https://arxiv.org/pdf/2108.01296.pdfhttps://arxiv.org/pdf/2108.01296.pdf
Zhang D, Zhang H W, Tang J H, Hua X S and Sun Q R. 2020b. Causal intervention for weakly-supervised semantic segmentation//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc: #56
Zhang F, Gu C C, Zhang C Y and Dai Y C. 2021b. Complementary patch for weakly supervised semantic segmentation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 7222-7231 [DOI: 10.1109/iccv48922.2021.00715http://dx.doi.org/10.1109/iccv48922.2021.00715]
Zhang T Y, Lin G S, Liu W D, Cai J F and Kot A. 2020c. Splitting vs. merging: mining object regions with discrepancy and intersection loss for weakly supervised semantic segmentation//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 663-679 [DOI: 10.1007/978-3-030-58542-6_40http://dx.doi.org/10.1007/978-3-030-58542-6_40]
Zhang X R, Peng Z L, Zhu P, Zhang T Y, Li C, Zhou H Y and Jiao L C. 2021c. Adaptive affinity loss and erroneous pseudo-label refinement for weakly supervised semantic segmentation//Proceedings of the 29th ACM International Conference on Multimedia. Chengdu, China: ACM: 5463-5472 [DOI: 10.1145/3474085.3475675http://dx.doi.org/10.1145/3474085.3475675]
Zhao W X, Zhou K, Li J Y, Tang T Y, Wang X L, Hou Y P, Min Y Q, Zhang B C, Zhang J J, Dong Z C, Du Y F, Yang C, Chen Y S, Chen Z P, Jiang J H, Ren R Y, Li Y F, Tang X Y, Liu Z K, Liu P Y, Nie J Y and Wen R J. 2023. A survey of large language models [EB/OL]. [2023-08-28]. https://arxiv.org/pdf/2303.18223.pdfhttps://arxiv.org/pdf/2303.18223.pdf
Zhou B L, Khosla A, Lapedriza A, Oliva A and Torralba A. 2016. Learning deep features for discriminative localization//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2921-2929 [DOI: 10.1109/CVPR.2016.319http://dx.doi.org/10.1109/CVPR.2016.319]
Zhou T F, Zhang M J, Zhao F and Li J W. 2022. Regional semantic contrast and aggregation for weakly supervised semantic segmentation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 4289-4299 [DOI: 10.1109/cvpr52688.2022.00426http://dx.doi.org/10.1109/cvpr52688.2022.00426]
相关作者
相关机构