基于深度学习的弱监督语义分割方法综述
Weakly supervised semantic segmentation based on deep learning
- 2024年29卷第5期 页码:1146-1168
收稿日期:2023-09-12,
修回日期:2023-12-05,
纸质出版日期:2024-05-16
DOI: 10.11834/jig.230628
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2023-09-12,
修回日期:2023-12-05,
纸质出版日期:2024-05-16
移动端阅览
语义分割是计算机视觉领域的基本任务,旨在为每个像素分配语义类别标签,实现对图像的像素级理解。得益于深度学习的发展,基于深度学习的全监督语义分割方法取得了巨大进展。然而,这些方法往往需要大量带有像素级标注的训练数据,标注成本巨大,限制了其在诸如自动驾驶、医学图像分析以及工业控制等实际场景中的应用。为了降低数据的标注成本并进一步拓宽语义分割的应用场景,研究者们越来越关注基于深度学习的弱监督语义分割方法,希望通过诸如图像级标注、最小包围盒标注、线标注和点标注等弱标注信息实现图像的像素级分割预测。首先对语义分割任务进行了简要介绍,并分析了全监督语义分割所面临的困境,从而引出弱监督语义分割。然后,介绍了相关数据集和评估指标。接着,根据弱标注的类型和受关注程度,从图像级标注、其他弱标注以及大模型辅助这3个方面回顾和讨论了弱监督语义分割的研究进展。其中,第2类弱监督语义分割方法包括基于最小包围盒、线和点标注的弱监督语义分割。最后,分析了弱监督语义分割领域存在的问题与挑战,并就其未来可能的研究方向提出建议,旨在进一步推动弱监督语义分割领域研究的发展。
Semantic segmentation is an important and fundamental task in the field of computer vision. Its goal is to assign a semantic category label to each pixel in an image, achieving pixel-level understanding. It has wide applications in areas, such as autonomous driving, virtual reality, and medical image analysis. Given the development of deep learning in recent years, remarkable progress has been achieved in fully supervised semantic segmentation, which requires a large amount of training data with pixel-level annotations. However, accurate pixel-level annotations are difficult to provide because it sacrifices substantial time, money, and human-label resources, thus limiting their widespread application in reality. To reduce the cost of annotating data and further expand the application scenarios of semantic segmentation, researchers are paying increasing attention to weakly supervised semantic segmentation (WSSS) based on deep learning. The goal is to develop a semantic segmentation model that utilizes weak annotations information instead of dense pixel-level annotations to predict pixel-level segmentation accurately. Weak annotations mainly include image-level, bounding-box, scribble, and point annotations. The key problem in WSSS lies in how to find a way to utilize the limited annotation information, incorporate appropriate training strategies, and design powerful models to bridge the gap between weak supervision and pixel-level annotations. This study aims to classify and summarize WSSS methods based on deep learning, analyze the challenges and problems encountered by recent methods, and provide insights into future research directions. First, we introduce WSSS as a solution to the limitations of fully supervised semantic segmentation. Second, we introduce the related datasets and evaluation metrics. Third, we review and discuss the research progress of WSSS from three categories: image-level annotations, other weak annotations, and assistance from large-scale models, where the second category includes bounding-box, scribble, and point annotations. Specifically, image-level annotations only provide object categories information contained in the image, without specifying the positions of the target objects. Existing methods always follow a two-stage training process: producing a class activation map (CAM), also known as initial seed regions used to generate high-quality pixel-level pseudo labels; and training a fully supervised semantic segmentation model using the produced pixel-level pseudo labels. According to whether the pixel-level pseudo labels are updated or not during the training process in the second stage, WSSS based on image-level annotations can be further divided into offline and online approaches. For offline approaches, existing research treats two stages independently, where the initial seed regions are optimized to obtain more reliable pixel-level pseudo labels that remain unchanged throughout the second stage. They are often divided into six classes according to different optimization strategies, including the ensemble of CAM, image erasing, co-occurrence relationship decoupling, affinity propagation, additional supervised information, and self-supervised learning. For online approaches, the pixel-level pseudo labels keep updating during the entire training process in the second stage. The production of pixel-level pseudo labels and the semantic segmentation model are jointly optimized. The online counterparts can be trained end to end, making the training process more efficient. Compared with image-level annotations, other weak annotations, including bounding box, scribble, and point, are more powerful supervised signals. Among them, bounding-box annotations not only provide object category labels but also include information of object positions. The regions outside the bounding-box are always considered background, while box regions simultaneously contain foreground and background areas. Therefore, for bounding-box annotations, existing research mainly starts from accurately distinguishing foreground areas from background regions within the bounding-box, thereby producing more accurate pixel-level pseudo labels, used for training following semantic segmentation networks. Scribble and point annotations not only indicate the categories of objects contained in the image but also provide local positional information of the target objects. For scribble annotations, more complete pseudo labels can be produced to supervise semantic segmentation by inferring the category of unlabeled regions from the annotated scribble. For point annotations, the associated semantic information is expanded to the entire image through label propagation, distance metric learning, and loss function optimization. In addition, with the rapid development of large-scale models, this paper further discusses the recent research achievements in using large-scale models to assist WSSS tasks. Large-scale models can leverage their pretrained universal knowledge to understand images and generate accurate pixel-level pseudo labels, thus improving the final segmentation performance. This paper also reports the quantitative segmentation results on pattern analysis, statistical modeling and computational learning visual object classes 2012(PASCAL VOC 2012) dataset to evaluate the performance of different WSSS methods. Finally, four challenges and potential future research directions are provided. First, a certain performance gap remains between weakly supervised and fully supervised methods. To bridge this gap, research should keep on improving the accuracy of pixel-level pseudo labels. Second, when WSSS models are applied to real-world scenarios, they may encounter object categories that have never appeared in the training data. This encounter requires the models to have a certain adaptability to identify and segment unknown objects. Third, existing research mainly focuses on improving the accuracy without considering the model size and inference speed of WSSS networks, posing a major challenge for the deployment of the model in real-world applications that require real-time estimations and online decisions. Fourth, the scarcity of relevant datasets used to evaluate different WSSS models and algorithms is also a major obstacle, which leads to performance degradation and limits generalization capability. Therefore, large-scale WSSS datasets with high quality, great diversity, and wide variation of image types must be constructed.
Ahn J , Cho S and Kwak S . 2019 . Weakly supervised learning of instance segmentation with inter-pixel relations // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE: 2204 - 2213 [ DOI: 10.1109/cvpr.2019.00231 http://dx.doi.org/10.1109/cvpr.2019.00231 ]
Ahn J and Kwak S . 2018 . Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE: 4981 - 4990 [ DOI: 10.1109/cvpr.2018.00523 http://dx.doi.org/10.1109/cvpr.2018.00523 ]
Araslanov N and Roth S . 2020 . Single-stage semantic segmentation from image labels // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 4252 - 4261 [ DOI: 10.1109/cvpr42600.2020.00431 http://dx.doi.org/10.1109/cvpr42600.2020.00431 ]
Arbelez P , Pont-Tuset J , Barron J , Marques F and Malik J . 2014 . Multiscale combinatorial grouping // Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus, USA : IEEE: 328 - 335 [ DOI: 10.1109/cvpr.2014.49 http://dx.doi.org/10.1109/cvpr.2014.49 ]
Baltrušaitis T , Ahuja C and Morency L P . 2019 . Multimodal machine learning: a survey and taxonomy . IEEE Transactions on Pattern Analysis and Machine Intelligence , 41 ( 2 ): 423 - 443 [ DOI: 10.1109/TPAMI.2018.2798607 http://dx.doi.org/10.1109/TPAMI.2018.2798607 ]
Bearman A , Russakovsky O , Ferrari V and Li F F . 2016 . What’s the point: semantic segmentation with point supervision // Proceedings of the 14th European Conference on Computer Vision . Amsterdam, the Netherlands : Springer: 549 - 565 [ DOI: 10.1007/978-3-319-46478-7_34 http://dx.doi.org/10.1007/978-3-319-46478-7_34 ]
Chang Y T , Wang Q S , Hung W C , Piramuthu R , Tsai Y H and Yang M H . 2020 . Weakly-supervised semantic segmentation via sub-category exploration // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 8988 - 8997 [ DOI: 10.1109/cvpr42600.2020.00901 http://dx.doi.org/10.1109/cvpr42600.2020.00901 ]
Chen L Y , Wu W W , Fu C C , Han X and Zhang Y T . 2020 . Weakly supervised semantic segmentation with boundary exploration // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 347 - 362 [ DOI: 10.1007/978-3-030-58574-7_21 http://dx.doi.org/10.1007/978-3-030-58574-7_21 ]
Chen Q , Yang L X , Lai J H and Xie X H . 2022a . Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 4278 - 4288 [ DOI: 10.1109/cvpr52688.2022.00425 http://dx.doi.org/10.1109/cvpr52688.2022.00425 ]
Chen T , Yao Y Z and Tang J H . 2023b . Multi-granularity denoising and bidirectional alignment for weakly supervised semantic segmentation . IEEE Transactions on Image Processing , 32 : 2960 - 2971 [ DOI: 10.1109/TIP.2023.3275913 http://dx.doi.org/10.1109/TIP.2023.3275913 ]
Chen T , Yao Y Z , Zhang L , Wang Q , Xie G S and Shen F M . 2023c . Saliency guided inter- and intra-class relation constraints for weakly supervised semantic segmentation . IEEE Transactions on Multimedia , 25 : 1727 - 1737 [ DOI: 10.1109/tmm.2022.3157481 http://dx.doi.org/10.1109/tmm.2022.3157481 ]
Chen T L , Mai Z D , Li R W and Chao W L . 2023a . Segment anything model (SAM) enhanced pseudo labels for weakly supervised semantic segmentation [EB/OL]. [ 2023-08-28 ]. https://arxiv.org/pdf/ 2305.05803.pdf https://arxiv.org/pdf/2305.05803.pdf
Chen Z , Tian Z Q , Zhu J H , Li C and Du S Y . 2022b . C-CAM: causal CAM for weakly supervised semantic segmentation on medical image // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 11666 - 11675 [ DOI: 10.1109/cvpr52688.2022.01138 http://dx.doi.org/10.1109/cvpr52688.2022.01138 ]
Chen Z Z and Sun Q R . 2023 . Extracting class activation maps from non-discriminative features as well // Proceedings of 2023 IEEE/ CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 3135 - 3144 [ DOI: 10.1109/CVPR52729.2023.00306 http://dx.doi.org/10.1109/CVPR52729.2023.00306 ]
Cordts M , Omran M , Ramos S , Rehfeld T , Enzweiler M , Benenson R , Franke U , Roth S and Schiele B . 2016 . The cityscapes dataset for semantic urban scene understanding // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, USA : IEEE: 3213 - 3223 [ DOI: 10.1109/cvpr.2016.350 http://dx.doi.org/10.1109/cvpr.2016.350 ]
Dai J F , He K M and Sun J . 2015 . BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago, Chile : IEEE: 1635 - 1643 [ DOI: 10.1109/iccv.2015.191 http://dx.doi.org/10.1109/iccv.2015.191 ]
Dalal N and Triggs B . 2005 . Histograms of oriented gradients for human detection // Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . San Diego, USA : IEEE: 886 - 893 [ DOI: 10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]
Dempster A P , Laird N M and Rubin D B . 1977 . Maximum likelihood from incomplete data via the EM algorithm . Journal of the Royal Statistical Society : Series B (Methodological) , 39 ( 1 ): 1 - 22 [ DOI: 10.1111/j.2517-6161.1977.tb01600.x http://dx.doi.org/10.1111/j.2517-6161.1977.tb01600.x ]
Dosovitskiy A , Beyer L , Kolesnikov A , Weissenborn D , Zhai X H , Unterthiner T , Dehghani M , Minderer M , Heigold G , Gelly S , Uszkoreit J and Houlsby N . 2021 . An image is worth 16 × 16 words: Transformers for image recognition at scale //Proceedings of the 9th International Conference on Learning Representations. [s.l.]: OpenReview.net
Du Y , Fu Z H , Liu Q J and Wang Y H . 2022 . Weakly supervised semantic segmentation by pixel-to-prototype contrast // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 4310 - 4319 [ DOI: 10.1109/CVPR52688.2022.00428 http://dx.doi.org/10.1109/CVPR52688.2022.00428 ]
Everingham M , Eslami S M A , Van Gool L , Williams C K I , Winn J and Zisserman A . 2015 . The Pascal visual object classes challenge: a retrospective . International Journal of Computer Vision , 111 ( 1 ): 98 - 136 [ DOI: 10.1007/s11263-014-0733-5 http://dx.doi.org/10.1007/s11263-014-0733-5 ]
Gao S H , Li Z Y , Yang M H , Cheng M M , Han J W and Torr P . 2023 . Large-scale unsupervised semantic segmentation . IEEE Transactions on Pattern Analysis and Machine Intelligence , 45 ( 6 ): 7457 - 7476 [ DOI: 10.1109/TPAMI.2022.3218275 http://dx.doi.org/10.1109/TPAMI.2022.3218275 ]
Grill J B , Strub F , Altché F , Tallec C , Richemond P H , Buchatskaya E , Doersch C , Pires B A , Guo Z D , Azar M G , Piot B , Kavukcuoglu K , Munos R and Valko M . 2020 . Bootstrap your own latent a new approach to self-supervised learning // Proceedings of the 34th International Conference on Neural Information Processing Systems . Vancouver, Canada : Curran Associates Inc: #1786
Guidotti R , Monreale A , Ruggieri S , Turini F , Giannotti F and Pedreschi D . 2019 . A survey of methods for explaining black box models . ACM Computing Surveys , 51 ( 5 ): 1 - 42 [ DOI: 10.1145/3236009 http://dx.doi.org/10.1145/3236009 ]
He K M , Fan H Q , Wu Y X , Xie S N and Girshick R . 2020 . Momentum contrast for unsupervised visual representation learning // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 9726 - 9735 [ DOI: 10.1109/cvpr42600.2020.00975 http://dx.doi.org/10.1109/cvpr42600.2020.00975 ]
Hinton G E and Salakhutdinov R R . 2006 . Reducing the dimensionality of data with neural networks . Science , 313 ( 5786 ): 504 - 507 [ DOI: 10.1126/science.1127647 http://dx.doi.org/10.1126/science.1127647 ]
Hou Q B , Jiang P T , Wei Y C and Chen M M . 2018 . Self-erasing network for integral object attention // Proceedings of the 32nd International Conference on Neural Information Processing Systems . Montréal, Canada : Curran Associates Inc: 547 - 557
Huang Z L , Wang X G , Wang J S , Liu W Y and Wang J D . 2018 . Weakly-supervised semantic segmentation network with deep seeded region growing // Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE: 7014 - 7023 [ DOI: 10.1109/CVPR.2018.00733 http://dx.doi.org/10.1109/CVPR.2018.00733 ]
Jiang P T , Han L H , Hou Q B , Cheng M M and Wei Y C . 2022a . Online attention accumulation for weakly supervised semantic segmentation . IEEE Transactions on Pattern Analysis and Machine Intelligence , 44 ( 10 ): 7062 - 7077 [ DOI: 10.1109/tpami.2021.3092573 http://dx.doi.org/10.1109/tpami.2021.3092573 ]
Jiang P T , Hou Q B , Cao Y , Cheng M M , Wei Y C and Xiong H K . 2019 . Integral object mining via online attention accumulation // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul, Korea (South) : IEEE: 2070 - 2079 [ DOI: 10.1109/iccv.2019.00216 http://dx.doi.org/10.1109/iccv.2019.00216 ]
Jiang P T , Yang Y Q , Hou Q B and Wei Y C . 2022b . L2G: a simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 16865 - 16875 [ DOI: 10.1109/cvpr52688.2022.01638 http://dx.doi.org/10.1109/cvpr52688.2022.01638 ]
Jing L L and Tian Y L . 2021 . Self-supervised visual feature learning with deep neural networks: a survey . IEEE Transactions on Pattern Analysis and Machine Intelligence , 43 ( 11 ): 4037 - 4058 [ DOI: 10.1109/TPAMI.2020.2992393 http://dx.doi.org/10.1109/TPAMI.2020.2992393 ]
Jo S and Yu I J . 2021 . Puzzle-CAM: improved localization via matching partial and full features // Proceedings of 2021 IEEE International Conference on Image Processing . Anchorage, USA : IEEE: 639 - 643 [ DOI: 10.1109/icip42928.2021.9506058 http://dx.doi.org/10.1109/icip42928.2021.9506058 ]
Joon Oh S , Benenson R , Khoreva A , Akata Z , Fritz M and Schiele B . 2017 . Exploiting saliency for object segmentation from image level labels // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, USA : IEEE: 5038 - 5047 [ DOI: 10.1109/cvpr.2017.535 http://dx.doi.org/10.1109/cvpr.2017.535 ]
Ke T W , Hwang J J and Yu S X . 2021 . Universal weakly supervised segmentation by pixel-to-segment contrastive learning //Proceedings of the 9th International Conference on Learning Representations. [s.l.]: OpenReview .net
Khoreva A , Benenson R , Hosang J , Hein M and Schiele B . 2017 . Simple does it: weakly supervised instance and semantic segmentation // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, USA : IEEE: 1665 - 1674 [ DOI: 10.1109/cvpr.2017.181 http://dx.doi.org/10.1109/cvpr.2017.181 ]
Kim D , Cho D , Yoo D and Kweon I S . 2017 . Two-phase learning for weakly supervised object localization // Proceedings of 2017 IEEE International Conference on Computer Vision . Venice, Italy : IEEE: 3554 - 3564 [ DOI: 10.1109/iccv.2017.382 http://dx.doi.org/10.1109/iccv.2017.382 ]
Kirillov A , Mintun E , Ravi N , Mao H Z , Rolland C , Gustafson L , Xiao T T , Whitehead S , Berg A C , Lo W Y , Dollr P and Girshick R . 2023 . Segment anything [EB/OL]. [ 2023-08-28 ]. https://arxiv.org/pdf/2304.02643.pdf https://arxiv.org/pdf/2304.02643.pdf
Kolesnikov A and Lampert C H . 2016 . Seed, expand and constrain: three principles for weakly-supervised image segmentation // Proceedings of the 14th European Conference on Computer Vision . Amsterdam, the Netherlands : Springer: 695 - 711 [ DOI: 10.1007/978-3-319-46493-0_42 http://dx.doi.org/10.1007/978-3-319-46493-0_42 ]
Krähenbühl P and Koltun V . 2011 . Efficient inference in fully connected CRFs with Gaussian edge potentials // Proceedings of the 24th International Conference on Neural Information Processing Systems . Granada, Spain : Curran Associates Inc: 109 - 117
Kulharia V , Chandra S , Agrawal A , Torr P and Tyagi A . 2020 . Box2Seg: attention weighted loss and discriminative feature learning for weakly supervised segmentation // Proceedings of the 16th European Conference on Computer Vision . Online : Springer: 290 - 308 [ DOI: 10.1007/978-3-030-58583-9_18 http://dx.doi.org/10.1007/978-3-030-58583-9_18 ]
Kweon H , Yoon S H , Kim H , Park D and Yoon K J . 2021 . Unlocking the potential of ordinary classifier: class-specific adversarial erasing framework for weakly supervised semantic segmentation // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 6974 - 6983 [ DOI: 10.1109/iccv48922.2021.00691 http://dx.doi.org/10.1109/iccv48922.2021.00691 ]
Kweon H , Yoon S H and Yoon K J . 2023 . Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 11329 - 11339 [ DOI: 10.1109/CVPR52729.2023.01090 http://dx.doi.org/10.1109/CVPR52729.2023.01090 ]
Lafferty J D , McCallum A and Pereira F C N . 2001 . Conditional random fields: probabilistic models for segmenting and labeling sequence data // Proceedings of the 18th International Conference on Machine Learning . Williams College, USA : Morgan Kaufmann Publishers Inc
Lee J , Kim E , Lee S , Lee J and Yoon S . 2019 . FickleNet: weakly and semi-supervised semantic image segmentation using stochastic inference // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Los Angeles, USA : IEEE: 5262 - 5271 [ DOI: 10.1109/CVPR.2019.00541 http://dx.doi.org/10.1109/CVPR.2019.00541 ]
Lee J , Kim E and Yoon S . 2021a . Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 4070 - 4078 [ DOI: 10.1109/cvpr46437.2021.00406 http://dx.doi.org/10.1109/cvpr46437.2021.00406 ]
Lee J , Yi J , Shin C and Yoon S . 2021b . BBAM: bounding box attribution map for weakly supervised semantic and instance segmentation // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 2643 - 2651 [ DOI: 10.1109/cvpr46437.2021.00267 http://dx.doi.org/10.1109/cvpr46437.2021.00267 ]
Lee S , Lee M , Lee J and Shim H . 2021c . Railroad is not a train: saliency as pseudo-pixel supervision for weakly supervised semantic segmentation // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 5491 - 5501 [ DOI: 10.1109/cvpr46437.2021.00545 http://dx.doi.org/10.1109/cvpr46437.2021.00545 ]
Li J , Fan J S and Zhang Z X . 2022a . Towards noiseless object contours for weakly supervised semantic segmentation // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 16835 - 16844 [ DOI: 10.1109/cvpr52688.2022.01635 http://dx.doi.org/10.1109/cvpr52688.2022.01635 ]
Li X Y , Zhou T F , Li J W , Zhou Y and Zhang Z X . 2021 . Group-wise semantic mining for weakly supervised semantic segmentation //Proceedings of the 35th AAAI Conference on Artificial Intelligence. [s.l.]: AAAI: 1984 - 1992 [ DOI: 10.1609/aaai.v35i3.16294 http://dx.doi.org/10.1609/aaai.v35i3.16294 ]
Li Y W , Zhao H S , Qi X J , Chen Y K , Qi L , Wang L W , Li Z M , Sun J and Jia J Y . 2022b . Fully convolutional networks for panoptic segmentation with point-based supervision . IEEE Transactions on Pattern Analysis and Machine Intelligence , 45 ( 4 ): 4552 - 4568 [ DOI: 10.1109/tpami.2022.3200416 http://dx.doi.org/10.1109/tpami.2022.3200416 ]
Lin D , Dai J F , Jia J Y , He K M and Sun J . 2016 . ScribbleSup: scribble-supervised convolutional networks for semantic segmentation // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, USA : IEEE: 3159 - 3167 [ DOI: 10.1109/cvpr.2016.344 http://dx.doi.org/10.1109/cvpr.2016.344 ]
Lin T Y , Maire M , Belongie S , Hays J , Perona P , Ramanan D , Dollr P and Zitnick C L . 2014 . Microsoft COCO: common objects in context // Proceedings of the 13th European Conference on Computer Vision . Zurich, Switzerland : Springer: 740 - 755 [ DOI: 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48 ]
Lin Y Q , Chen M H , Wang W X , Wu B X , Li K , Lin B B , Liu H F and He X F . 2023 . CLIP is also an efficient segmenter: a text-driven approach for weakly supervised semantic segmentation // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 15305 - 15314 [ DOI: 10.1109/CVPR52729.2023.01469 http://dx.doi.org/10.1109/CVPR52729.2023.01469 ]
Liu S L , Zeng Z Y , Ren T H , Li F , Zhang H , Yang J , Li C Y , Yang J W , Su H , Zhu J and Zhang L . 2023 . Grounding DINO: marrying DINO with grounded pre-training for open-set object detection [EB/OL]. [ 2023-08-28 ]. https://arxiv.org/pdf/2303.05499.pdf https://arxiv.org/pdf/2303.05499.pdf
Long J , Shelhamer E and Darrell T . 2015 . Fully convolutional networks for semantic segmentation // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Boston, USA : IEEE: 3431 - 3440 [ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Lowe D G . 2004 . Distinctive image features from scale-invariant keypoints . International Journal of Computer Vision , 60 ( 2 ): 91 - 110 [ DOI: 10.1023/B:VISI.0000029664.99615.94 http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94 ]
MacQueen J . 1967 . Some methods for classification and analysis of multivariate observations // The 5th Berkeley Symposium on Mathematical Statistics and Probability . Oakland, USA : Unversity of California Press: 281 - 297
Maninis K K , Caelles S , Pont-Tuset J and van Gool L . 2018 . Deep extreme cut: from extreme points to object segmentation // Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE: 616 - 625 [ DOI: 10.1109/cvpr.2018.00071 http://dx.doi.org/10.1109/cvpr.2018.00071 ]
McEver R A and Manjunath B S . 2020 . PCAMs: weakly supervised semantic segmentation using point supervision [EB/OL]. [ 2023-08-28 ]. https://arxiv.org/pdf/2007.05615.pdf https://arxiv.org/pdf/2007.05615.pdf
Minaee S , Boykov Y Y , Porikli F , Plaza A J , Kehtarnavaz N and Terzopoulos D . 2022 . Image segmentation using deep learning: a survey . IEEE Transactions on Pattern Analysis and Machine Intelligence , 44 ( 7 ): 3523 - 3542 [ DOI: 10.1109/TPAMI.2021.3059968 http://dx.doi.org/10.1109/TPAMI.2021.3059968 ]
Mottaghi R , Chen X J , Liu X B , Cho N G , Lee S W , Fidler S , Urtasun R and Yuille A . 2014 . The role of context for object detection and semantic segmentation in the wild // Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus, USA : IEEE: 891 - 898 [ DOI: 10.1109/cvpr.2014.119 http://dx.doi.org/10.1109/cvpr.2014.119 ]
Neuhold G , Ollmann T , Rota Bulo S and Kontschieder P . 2017 . The Mapillary vistas dataset for semantic understanding of street scenes // Proceedings of 2017 IEEE International Conference on Computer Vision . Venice, Italy : IEEE: 4990 - 4999 [ DOI: 10.1109/iccv.2017.534 http://dx.doi.org/10.1109/iccv.2017.534 ]
Oh Y , Kim B and Ham B . 2021 . Background-aware pooling and noise-aware loss for weakly-supervised semantic segmentation // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 6909 - 6918 [ DOI: 10.1109/cvpr46437.2021.00684 http://dx.doi.org/10.1109/cvpr46437.2021.00684 ]
Ojala T , Pietikainen M and Harwood D . 1994 . Performance evaluation of texture measures with classification based on Kullback discrimination of distributions // Proceedings of the 12th International Conference on Pattern Recognition . Jerusalem, Israel : IEEE: 582 - 585 [ DOI: 10.1109/ICPR.1994.576366 http://dx.doi.org/10.1109/ICPR.1994.576366 ]
Pan S J and Yang Q . 2010 . A survey on transfer learning . IEEE Transactions on Knowledge and Data Engineering , 22 ( 10 ): 1345 - 1359 [ DOI: 10.1109/TKDE.2009.191 http://dx.doi.org/10.1109/TKDE.2009.191 ]
Papadopoulos D P , Uijlings J R R , Keller F and Ferrari V . 2017 . Extreme clicking for efficient object annotation // Proceedings of 2017 IEEE International Conference on Computer Vision . Venice, Italy : IEEE: 4940 - 4949 [ DOI: 10.1109/iccv.2017.528 http://dx.doi.org/10.1109/iccv.2017.528 ]
Papandreou G , Chen L C , Murphy K P and Yuille A L . 2015 . Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago, Chile : IEEE: 1742 - 1750 [ DOI: 10.1109/iccv.2015.203 http://dx.doi.org/10.1109/iccv.2015.203 ]
Peng Z L , Wang G C , Xie L X , Jiang D S , Shen W and Tian Q . 2023 . USAGE: a unified seed area generation paradigm for weakly supervised semantic segmentation // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision . Paris, France : IEEE [ DOI: 10.1109/ICCV51070.2023.00064 http://dx.doi.org/10.1109/ICCV51070.2023.00064 ]
Qian R , Wei Y C , Shi H H , Li J C , Liu J Y and Huang T . 2019 . Weakly supervised scene parsing with point-based distance metric learning // Proceedings of the 33rd AAAI Conference on Artificial Intelligence . Honolulu, USA : AAAI: 8843 - 8850 [ DOI: 10.1609/aaai.v33i01.33018843 http://dx.doi.org/10.1609/aaai.v33i01.33018843 ]
Qing C , Yu J , Xiao C B and Duan J . 2020 . Deep convolutional neural network for semantic image segmentation . Journal of Image and Graphics , 25 ( 6 ): 1069 - 1090
青晨 , 禹晶 , 肖创柏 , 段娟 . 2020 . 深度卷积神经网络图像语义分割研究进展 . 中国图象图形学报 , 25 ( 6 ): 1069 - 1090 [ DOI: 10.11834/jig.190355 http://dx.doi.org/10.11834/jig.190355 ]
Radford A , Kim J W , Hallacy C , Ramesh A , Goh G , Agarwal S , Sastry G , Askell A , Mishkin P , Clark J , Krueger G and Sutskever I . 2021 . Learning transferable visual models from natural language supervision //Proceedings of the 38th International Conference on Machine Learning. [s.l.]: ACM: 8748 - 8763
Ren D W , Wang Q L , Wei Y C , Meng D Y and Zuo W M . 2022 . Progress in weakly supervised learning for visual understanding . Journal of Image and Graphics , 27 ( 6 ): 1768 - 1798
任冬伟 , 王旗龙 , 魏云超 , 孟德宇 , 左旺孟 . 2022 . 视觉弱监督学习研究进展 . 中国图象图形学报 , 27 ( 6 ): 1768 - 1798 [ DOI: 10.11834/jig.220178 http://dx.doi.org/10.11834/jig.220178 ]
Rong S H , Tu B H , Wang Z L and Li J J . 2023 . Boundary-enhanced Co-training for weakly supervised semantic segmentation // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 19574 - 19584 [ DOI: 10.1109/CVPR52729.2023.01875 http://dx.doi.org/10.1109/CVPR52729.2023.01875 ]
Rother C , Kolmogorov V and Blake A . 2004 . “GrabCut”: interactive foreground extraction using iterated graph cuts . ACM Transactions on Graphics , 23 ( 3 ): 309 - 314 [ DOI: 10.1145/1015706.1015720 http://dx.doi.org/10.1145/1015706.1015720 ]
Ru L X , Zhan Y B , Yu B S and Du B . 2022 . Learning affinity from attention: end-to-end weakly-supervised semantic segmentation with Transformers // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 16825 - 16834 [ DOI: 10.1109/CVPR52688.2022.01634 http://dx.doi.org/10.1109/CVPR52688.2022.01634 ]
Ru L X , Zheng H L , Zhan Y B and Du B . 2023 . Token contrast for weakly-supervised semantic segmentation // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 3093 - 3102 [ DOI: 10.1109/CVPR52729.2023.00302 http://dx.doi.org/10.1109/CVPR52729.2023.00302 ]
Scarselli F , Gori M , Tsoi A C , Hagenbuchner M and Monfardini G . 2009 . The graph neural network model . IEEE Transactions on Neural Networks , 20 ( 1 ): 61 - 80 [ DOI: 10.1109/TNN.2008.2005605 http://dx.doi.org/10.1109/TNN.2008.2005605 ]
Shen W , Peng Z L , Wang X H , Wang H Y , Cen J Z , Jiang D S , Xie L X , Yang X K and Tian Q . 2023 . A survey on label-efficient deep image segmentation: bridging the gap between weak supervision and dense prediction . IEEE Transactions on Pattern Analysis and Machine Intelligence , 45 ( 8 ): 9284 - 9305 [ DOI: 10.1109/TPAMI.2023.3246102 http://dx.doi.org/10.1109/TPAMI.2023.3246102 ]
Song C F , Huang Y , Ouyang W L and Wang L . 2019 . Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Los Angeles, USA : IEEE: 3136 - 3145 [ DOI: 10.1109/cvpr.2019.00325 http://dx.doi.org/10.1109/cvpr.2019.00325 ]
Su Y K , Sun R Z , Lin G S and Wu Q Y . 2021 . Context decoupling augmentation for weakly supervised semantic segmentation // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 6984 - 6994 [ DOI: 10.1109/iccv48922.2021.00692 http://dx.doi.org/10.1109/iccv48922.2021.00692 ]
Sun G L , Wang W G , Dai J F and van Gool L . 2020 . Mining cross-image semantics for weakly supervised semantic segmentation // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 347 - 365 [ DOI: 10.1007/978-3-030-58536-5_21 http://dx.doi.org/10.1007/978-3-030-58536-5_21 ]
Sun K Y , Shi H Q , Zhang Z M and Huang Y M . 2021 . ECS-Net: improving weakly supervised semantic segmentation by using connections between class activation maps // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 7263 - 7272 [ DOI: 10.1109/iccv48922.2021.00719 http://dx.doi.org/10.1109/iccv48922.2021.00719 ]
Sun W X , Liu Z Y , Zhang Y H , Zhong Y R and Barnes N . 2023 . An alternative to WSSS? An empirical study of the segment anything model (SAM) on weakly-supervised semantic segmentation problems [EB/OL]. [ 2023-08-28 ]. https://arxiv.org/pdf/2305.01586.pdf https://arxiv.org/pdf/2305.01586.pdf
Tang M , Djelouah A , Perazzi F , Boykov Y and Schroers C . 2018a . Normalized cut loss for weakly-supervised CNN segmentation // Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE: 1818 - 1827 [ DOI: 10.1109/cvpr.2018.00195 http://dx.doi.org/10.1109/cvpr.2018.00195 ]
Tang M , Perazzi F , Djelouah A , Ayed I B , Schroers C and Boykov Y . 2018b . On regularized losses for weakly-supervised CNN segmentation // Proceedings of the 15th European Conference on Computer Vision (ECCV) . Munich, Germany : Springer: 524 - 540 [ DOI: 10.1007/978-3-030-01270-0_31 http://dx.doi.org/10.1007/978-3-030-01270-0_31 ]
Vaswani A , Shazeer N , Parmar N , Uszkoreit J , Jones L , Gomez A N , Kaiser Ł and Polosukhin L . 2017 . Attention is all you need // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach, USA : Curran Associates Inc: 6000 - 6010
Vernaza P and Chandraker M . 2017 . Learning random-walk label propagation for weakly-supervised semantic segmentation // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, USA : IEEE: 2953 - 2961 [ DOI: 10.1109/cvpr.2017.315 http://dx.doi.org/10.1109/cvpr.2017.315 ]
Wang B , Qi G J , Tang S , Zhang T Z , Wei Y C , Li L H and Zhang Y D . 2019 . Boundary perception guidance: a scribble-supervised semantic segmentation approach // Proceedings of the 28th IJCAI International Joint Conference on Artificial Intelligence . Macao, China : Morgan Kaufmann: 3663 - 3669 [ DOI: 10.24963/ijcai.2019/508 http://dx.doi.org/10.24963/ijcai.2019/508 ]
Wang Y D , Zhang J , Kan M N , Shan S G and Chen X L . 2020 . Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 12272 - 12281 [ DOI: 10.1109/cvpr42600.2020.01229 http://dx.doi.org/10.1109/cvpr42600.2020.01229 ]
Wei Y C , Feng J S , Liang X D , Cheng M M , Zhao Y and Yan S C . 2017 . Object region mining with adversarial erasing: a simple classification to semantic segmentation approach // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, USA : IEEE: 6488 - 6496 [ DOI: 10.1109/cvpr.2017.687 http://dx.doi.org/10.1109/cvpr.2017.687 ]
Wei Y C , Xiao H X , Shi H H , Jie Z Q , Feng J S and Huang T S . 2018 . Revisiting dilated convolution: a simple approach for weakly-and semi-supervised semantic segmentation // Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE: 7268 - 7277 [ DOI: 10.1109/CVPR.2018.00759 http://dx.doi.org/10.1109/CVPR.2018.00759 ]
Weinberger K Q and Saul L K . 2009 . Distance metric learning for large margin nearest neighbor classification . The Journal of Machine Learning Research , 10 : 207 - 244
Wu T , Huang J S , Gao G Y , Wei X M , Wei X L , Luo X and Liu C H . 2021 . Embedded discriminative attention mechanism for weakly supervised semantic segmentation // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 16760 - 16769 [ DOI: 10.1109/cvpr46437.2021.01649 http://dx.doi.org/10.1109/cvpr46437.2021.01649 ]
Xian Y Q , Lampert C H , Schiele B and Akata Z . 2019 . Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly . IEEE Transactions on Pattern Analysis and Machine Intelligence , 41 ( 9 ): 2251 - 2265 [ DOI: 10.1109/TPAMI.2018.2857768 http://dx.doi.org/10.1109/TPAMI.2018.2857768 ]
Xie J H , Hou X X , Ye K and Shen L L . 2022a . CLIMS: cross language image matching for weakly supervised semantic segmentation // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 4473 - 4482 [ DOI: 10.1109/cvpr52688.2022.00444 http://dx.doi.org/10.1109/cvpr52688.2022.00444 ]
Xie J H , Xiang J F , Chen J L , Hou X X , Zhao X D and Shen L L . 2022b . C 2 AM: contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA: IEEE: 989 - 998 [ DOI: 10.1109/cvpr52688.2022.00106 http://dx.doi.org/10.1109/cvpr52688.2022.00106 ]
Xu J S , Zhou C W , Cui Z , Xu C Y , Huang Y G , Shen P C , Li S X and Yang J . 2021 . Scribble-supervised semantic segmentation inference // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 15334 - 15343 [ DOI: 10.1109/iccv48922.2021.01507 http://dx.doi.org/10.1109/iccv48922.2021.01507 ]
Xu L , Ouyang W L , Bennamoun M , Boussaid F and Xu D . 2022 . Multi-class token Transformer for weakly supervised semantic segmentation // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 4300 - 4309 [ DOI: 10.1109/cvpr52688.2022.00427 http://dx.doi.org/10.1109/cvpr52688.2022.00427 ]
Xu L , Ouyang W L , Bennamoun M , Boussaid F and Xu D . 2023 . Learning multi-modal class-specific tokens for weakly supervised dense object localization // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 19596 - 19605 [ DOI: 10.1109/CVPR52729.2023.01877 http://dx.doi.org/10.1109/CVPR52729.2023.01877 ]
Yu Z , Zhuge Y Z , Lu H C and Zhang L H . 2019 . Joint learning of saliency detection and weakly supervised semantic segmentation // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul, Korea (South) : IEEE: 7223 - 7233 [ DOI: 10.1109/ICCV.2019.00732 http://dx.doi.org/10.1109/ICCV.2019.00732 ]
Zhang B F , Xiao J M , Wei Y C , Sun M J and Huang K Z . 2020a . Reliability does matter: an end-to-end weakly supervised semantic segmentation approach // Proceedings of the 34th AAAI Conference on Artificial Intelligence . New York, USA : AAAI: 12765 - 12772 [ DOI: 10.1609/aaai.v34i07.6971 http://dx.doi.org/10.1609/aaai.v34i07.6971 ]
Zhang B F , Xiao J M and Zhao Y . 2021a . Dynamic feature regularized loss for weakly supervised semantic segmentation [EB/OL]. [ 2023-08-28 ]. https://arxiv.org/pdf/2108.01296.pdf https://arxiv.org/pdf/2108.01296.pdf
Zhang D , Zhang H W , Tang J H , Hua X S and Sun Q R . 2020b . Causal intervention for weakly-supervised semantic segmentation // Proceedings of the 34th International Conference on Neural Information Processing Systems . Vancouver, Canada : Curran Associates Inc: #56
Zhang F , Gu C C , Zhang C Y and Dai Y C . 2021b . Complementary patch for weakly supervised semantic segmentation // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 7222 - 7231 [ DOI: 10.1109/iccv48922.2021.00715 http://dx.doi.org/10.1109/iccv48922.2021.00715 ]
Zhang T Y , Lin G S , Liu W D , Cai J F and Kot A . 2020c . Splitting vs. merging: mining object regions with discrepancy and intersection loss for weakly supervised semantic segmentation // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 663 - 679 [ DOI: 10.1007/978-3-030-58542-6_40 http://dx.doi.org/10.1007/978-3-030-58542-6_40 ]
Zhang X R , Peng Z L , Zhu P , Zhang T Y , Li C , Zhou H Y and Jiao L C . 2021c . Adaptive affinity loss and erroneous pseudo-label refinement for weakly supervised semantic segmentation // Proceedings of the 29th ACM International Conference on Multimedia . Chengdu, China : ACM: 5463 - 5472 [ DOI: 10.1145/3474085.3475675 http://dx.doi.org/10.1145/3474085.3475675 ]
Zhao W X , Zhou K , Li J Y , Tang T Y , Wang X L , Hou Y P , Min Y Q , Zhang B C , Zhang J J , Dong Z C , Du Y F , Yang C , Chen Y S , Chen Z P , Jiang J H , Ren R Y , Li Y F , Tang X Y , Liu Z K , Liu P Y , Nie J Y and Wen R J . 2023 . A survey of large language models [EB/OL]. [ 2023-08-28 ]. https://arxiv.org/pdf/2303.18223.pdf https://arxiv.org/pdf/2303.18223.pdf
Zhou B L , Khosla A , Lapedriza A , Oliva A and Torralba A . 2016 . Learning deep features for discriminative localization // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, USA : IEEE: 2921 - 2929 [ DOI: 10.1109/CVPR.2016.319 http://dx.doi.org/10.1109/CVPR.2016.319 ]
Zhou T F , Zhang M J , Zhao F and Li J W . 2022 . Regional semantic contrast and aggregation for weakly supervised semantic segmentation // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 4289 - 4299 [ DOI: 10.1109/cvpr52688.2022.00426 http://dx.doi.org/10.1109/cvpr52688.2022.00426 ]
相关作者
相关机构