跨层细节感知和分组注意力引导的遥感图像语义分割

李林娟; 贺赟; 谢刚; 张浩雪; 柏艳红

doi:10.11834/jig.230653

图像/视频语义分割 | 浏览量 : 0 下载量: 8 CSCD: 0

PDF
导出
分享
收藏
专辑

跨层细节感知和分组注意力引导的遥感图像语义分割
Cross-layer detail perception and group attention-guided semantic segmentation network for remote sensing images
2024年29卷第5期页码：1277-1290
纸质出版日期： 2024-05-16 ，
DOI： 10.11834/jig.230653
稿件说明：

移动端阅览

李林娟，贺赟，谢刚，张浩雪，柏艳红. 2024. 跨层细节感知和分组注意力引导的遥感图像语义分割. 中国图象图形学报， 29(05):1277-1290

Li Linjuan， He Yun， Xie Gang， Zhang Haoxue， Bai Yanhong. 2024. Cross-layer detail perception and group attention-guided semantic segmentation network for remote sensing images. Journal of Image and Graphics， 29(05):1277-1290
李林娟，贺赟，谢刚，张浩雪，柏艳红. 2024. 跨层细节感知和分组注意力引导的遥感图像语义分割. 中国图象图形学报， 29(05):1277-1290 DOI： 10.11834/jig.230653.

Li Linjuan， He Yun， Xie Gang， Zhang Haoxue， Bai Yanhong. 2024. Cross-layer detail perception and group attention-guided semantic segmentation network for remote sensing images. Journal of Image and Graphics， 29(05):1277-1290 DOI： 10.11834/jig.230653.

摘要

目的

语义分割是遥感图像智能解译的关键任务之一，遥感图像覆盖面广，背景交叉复杂，且地物尺寸差异性大。现有方法在复杂背景下的多尺度地物上分割效果较差，且分割区域破碎边界不连续。针对上述问题，提出了一种跨层细节感知和分组注意力引导的语义分割模型用于高分辨率遥感图像解析。

方法

首先采用结构新颖的ConvNeXt骨干网络，编码输入图像的各层次特征。其次，设计了分组协同注意力模块，分组并行建模通道和空间维度的特征依赖性，通道注意力和空间注意力协同强化重要通道和区域的特征信息。接着，引入了自注意力机制，构建了跨层细节感知模块，利用低层特征中丰富的细节信息，指导高层特征层学习空间细节，保证分割结果的区域完整性和边界连续性。最后，以山西省太原市为研究区域，自制高分辨率遥感太原市城区土地覆盖数据集（Taiyuan urban land cover dataset，TULCD），所提方法实现了太原市城区土地覆盖精细分类任务。

结果

实验在自制数据集TULCD和公开数据集Vaihingen上与最新的5种算法进行了比较，所提方法在两个数据集上平均像素准确率（mean pixel accuracy，mPA）为74.23%、87.26%，平均交并比（mean intersection over union，mIoU）为58.91%、77.02%，平均得分mF

为72.24%、86.35%，均优于对比算法。

结论

本文提出的高分辨率遥感图像语义分割模型具有较强的空间和细节感知能力，对类间差异小的相邻地物也有较强的鉴别能力，模型的整体分割精度较高。

Abstract

Objective

Semantic segmentation plays a crucial role in intelligent interpretation of remote sensing images. With the rapid advancement of remote sensing technology and the burgeoning field of big data mining， the semantic segmentation of remote sensing images has become increasingly pivotal across diverse applications， such as natural resource survey， mineral exploration， water quality monitoring， and vegetation ecological assessment. The expansive coverage of remote sensing images， coupled with intricate background intersections and considerable variations in the sizes of ground objects， underscores the difficulties and challenges to the task at hand. Existing methods exhibit limitations in achieving high segmentation accuracies， particularly when confronted with multiscale objects within intricate backgrounds. The resulting segmentation boundaries often appear fuzzy and discontinuous. Thus， a cross-layer detail perception and group attention-guided semantic segmentation network （CDGCANet） is proposed for high-resolution remote sensing images.

Method

First， the ConvNeXt backbone network with a novel structure is used to encode the network features at each level of the input image. It combines the popular Transformer network architecture and the classic convolutional neural network architecture， takes advantage of the two mainstream architectures， and adopts the SwinTransformer design strategy to improve the structure of ResNet50， obtaining the ConvNeXt network structure. Second， the group collaborative attention module is designed to model the feature dependencies of channel and spatial dimensions in parallel， thereby modeling the spatial and channel relationships of multiscale feature features and promoting the information interaction between channels. Channel attention and spatial attention collaboratively enhance the feature information of important channels and regions and then improve the network's ability to discriminate multiscale features， especially small targets. Next， a self-attention mechanism is introduced to construct the cross-layer detail-aware module （CDM）， which uses the rich detail information in low-level features to guide high-level feature layers in learning spatial details and ensure the regional integrity and boundary continuity of segmentation results. During semantic segmentation network coding， the shallow features have strong detail information but poor semantic consistency due to the limited sensing field， while the deep features are rough the spatial information due to low resolution and inability to restore the detail information. This leads to problems， such as missing segmentation edges and discontinuity. The CDM module utilizes the spatial information of the previous layers to guide the learning of the deeper detailed features and thus ensure the semantic consistency between the low level features and high level features. Finally， Taiyuan City， Shanxi Province is taken as the research area； the high-resolution remote sensing Taiyuan urban land cover dataset， termed TULCD， is self-made. Whose original remote sensing image is extracted from 1 m-level data source of Gaofen-2 domestic satellite， the size of the original image reaches 56 251 × 52 654 pixels with the overall capacity size of 12.7 GB. The overlap tiling strategy is used for the large remote sensing image cropping with the size of the sliding window of 512 and the step size of 256 to produce 512 × 512 pixel images. A total of 6 607 images are obtained， and the dataset is divided in accordance with the 8∶2 ratio， in which 5 285 images are for the training set and 1 322 images are for the validation set. The proposed method realizes the task of fine classification of land cover in the urban area of Taiyuan City.

Result

The experiments were conducted with the latest five algorithms （e.g.， UNet， PSPNet， DeeplabV3+， A

-FPN， and Swin Transformer） on the self-made dataset TULCD and the public dataset Vaihingen， and three evaluation metrics were used to evaluate the model performance. The performance of the proposed CDGCANet outperforms other algorithms on the TULCD dataset， with an average pixel accuracy （mPA）， average intersection over union （mIoU）， and mF

of 74.23%， 58.91%， and 72.24%， respectively， and the overall performance exceeds that of the second-ranked model PSPNet with an mPA of 4.61%， an mIoU of 1.58%， and mF

of 1.63%. The overall performance achieved by the CDGCANet on the Vaih

ingen dataset is 83.22%， 77.62%， and 86.26% for mPA， mIoU， and mF

， respectively. These values are higher than those of the second-ranked model DeeplabV3+， which are 1.86%， 2.62%， and 2.06% for mPA， mIoU， and mF

， respectively. According to the visualization， the results show that the model can correctly identify the feature target with a complete segmentation area， clear details， and continuous edges. In addition， the neural network visualization tool GradCAM is used to view the category heat map output by the model. Experimental results show that the attention mechanism can help the model focus on key areas and ground objects and enhance the feature expression ability of the model.

Conclusion

The semantic segmentation model of high-resolution remote sensing images proposed in this study has strong spatial and detail perception capabilities， which not only improves the accuracy of semantic segmentation but also yields more satisfactory segmentation results when handling complex remote sensing images. Looking ahead， we anticipate further optimization and in-depth research to propel the practical application of this model， contributing to remarkable breakthroughs and advancements in the field of remote sensing image interpretation.

关键词

遥感图像语义分割全卷积网络（FCN）注意力机制分组卷积

Keywords

remote sensing imagessemantic segmentationfully convolutional network （FCN）attention mechanismsgroup convolution

references

Badrinarayanan V， Kendall A and Cipolla R. 2017. SegNet： a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（12）： 2481-2495 ［DOI： 10.1109/TPAMI.2016.2644615http://dx.doi.org/10.1109/TPAMI.2016.2644615］

Bai H W， Cheng J， Huang X， Liu S Y and Deng C J. 2022. HCANet： a hierarchical context aggregation network for semantic segmentation of high-resolution remote sensing images. IEEE Geoscience and Remote Sensing Letters， 19： #6002105 ［DOI： 10.1109/LGRS.2021.3063799http://dx.doi.org/10.1109/LGRS.2021.3063799］

Bressan P O， Junior J M， Martins J A C， De Melo M J， Gonçalves D N， Freitas D M， Ramos A P M， Furuya M T G， Osco L P， De Andrade Silva J， Luo Z P， Garcia R C， Ma L F， Li J and Gonçalves W N. 2022. Semantic segmentation with labeling uncertainty and class imbalance applied to vegetation mapping. International Journal of Applied Earth Observation and Geoinformation， 108： #102690 ［DOI： 10.1016/J.JAG.2022.102690http://dx.doi.org/10.1016/J.JAG.2022.102690］

Chen L C， Papandreou G， Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation ［EB/OL］. ［2023-09-26］. https://arxiv.org/pdf/1706.05587.pdfhttps://arxiv.org/pdf/1706.05587.pdf

Chen W T， Ouyang S B， Tong W， Li X J， Zheng X W and Wang L Z. 2022. GCSANet： a global context spatial attention deep learning network for remote sensing scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 15： 1150-1162 ［DOI： 10.1109/JSTARS.2022.3141826http://dx.doi.org/10.1109/JSTARS.2022.3141826］

Filippo M P， da Fonseca Martins Gomes O， da Costa G A O P and Mota G L A. 2021. Deep learning semantic segmentation of opaque and non-opaque minerals from epoxy resin in reflected light microscopy images. Minerals Engineering， 170： #107007 ［DOI： 10.1016/J.MINENG.2021.107007http://dx.doi.org/10.1016/J.MINENG.2021.107007］

Hu J， Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7132-7141 ［DOI： 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745］

Li R， Wang L B， Zhang C， Duan C X and Zheng S Y. 2022. A2-FPN for semantic segmentation of fine-resolution remotely sensed images. International Journal of Remote Sensing， 43（3）： 1131-1155 ［DOI： 10.1080/01431161.2022.2030071http://dx.doi.org/10.1080/01431161.2022.2030071］

Li X W， Li Y S and Zhang Y J. 2021. Weakly supervised deep semantic segmentation network for water body extraction based on multi-source remote sensing imagery. Journal of Image and Graphics， 26（12）： 3015-3026

李鑫伟，李彦胜，张永军. 2021. 弱监督深度语义分割网络的多源遥感影像水体检测. 中国图象图形学报， 26（12）： 3015-3026 ［DOI： 10.11834/jig.200192http://dx.doi.org/10.11834/jig.200192］

Lin G S， Milan A， Shen C H and Reid I. 2017. RefineNet： multi-path refinement networks for high-resolution semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 5168-5177 ［DOI： 10.1109/CVPR.2017.549http://dx.doi.org/10.1109/CVPR.2017.549］

Liu S， Li X Y， Yu M and Xing G L. 2023. Dual decoupling semantic segmentation model for high-resolution remote sensing images. Acta Geodaetica et Cartographica Sinica， 52（4）： 638-647

刘帅，李笑迎，于梦，邢光龙. 2023. 高分辨率遥感图像双解耦语义分割网络模型. 测绘学报， 52（4）： 638-647 ［DOI： 10.11947/j.AGCS.2023.20210455http://dx.doi.org/10.11947/j.AGCS.2023.20210455］

Liu Z， Mao H Z， Wu C Y， Feichtenhofer C， Darrell T and Xie S N. 2022. A ConvNet for the 2020s//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 11966-11976 ［DOI： 10.1109/CVPR52688.2022.01167http://dx.doi.org/10.1109/CVPR52688.2022.01167］

Long J， Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 3431-3440 ［DOI： 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965］

Qin X M， Wu Z Y， Luo X W， Li B， Zhao D N， Zhou J Q， Wang M W， Wan H Y and Chen X L. 2023. Temporal fusion based 1-D sequence semantic segmentation model for automatic precision side scan sonar bottom tracking. IEEE Transactions on Geoscience and Remote Sensing， 61： #4201816 ［DOI： 10.1109/TGRS.2023.3245603http://dx.doi.org/10.1109/TGRS.2023.3245603］

Ronneberger O， Fischer P and Brox T. 2015. U-net： convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich， Germany： Springer International Publishing： 234-241 ［DOI： 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Saltiel T M， Dennison P E， Campbell M J， Thompson T R and Hambrecht K R. 2022. Tradeoffs between UAS spatial resolution and accuracy for deep learning semantic segmentation applied to wetland vegetation species mapping. Remote Sensing， 14（11）： #2703 ［DOI： 10.3390/rs14112703http://dx.doi.org/10.3390/rs14112703］

Tan X W， Xiao Z F， Zhang Y R， Wang Z J， Qi X L and Li D R. 2023. Context-driven feature-focusing network for semantic segmentation of high-resolution remote sensing images. Remote Sensing， 15（5）： #1348 ［DOI： 10.3390/rs15051348http://dx.doi.org/10.3390/rs15051348］

Wang Q L， Wu B G， Zhu P F， Li P H， Zuo W M and Hu Q H. 2020. ECA-Net： efficient channel attention for deep convolutional neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 11531-11539 ［DOI： 10.1109/CVPR42600.2020.01155http://dx.doi.org/10.1109/CVPR42600.2020.01155］

Woo S， Park J， Lee J Y and Kweon I S. 2018. CBAM： convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 3-19 ［DOI： 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1］

Wu Q Q， Wang S， Wang B and Wu Y L. 2022. Road extraction method of high-resolution remote sensing image on the basis of the spatial information perception semantic segmentation model. National Remote Sensing Bulletin， 26（9）： 1872-1885

吴强强，王帅，王彪，吴艳兰. 2022. 空间信息感知语义分割模型的高分辨率遥感影像道路提取. 遥感学报， 26（9）： 1872-1885 ［DOI： 10.11834/jrs.20210021http://dx.doi.org/10.11834/jrs.20210021］

Wu T Y， Tang S， Zhang R， Cao J and Zhang Y D. 2021. CGNet： a light-weight context guided network for semantic segmentation. IEEE Transactions on Image Processing， 30： 1169-1179 ［DOI： 10.1109/TIP.2020.3042065http://dx.doi.org/10.1109/TIP.2020.3042065］

Zhang Q L and Yang Y B. 2021. SA-Net： shuffle attention for deep convolutional neural networks//Proceedings of 2021 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP）. Toronto， Canada： IEEE： 2235-2239 ［DOI： 10.1109/ICASSP39728.2021.9414568http://dx.doi.org/10.1109/ICASSP39728.2021.9414568］

Zhao H S， Shi J P， Qi X J， Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 6230-6239 ［DOI： 10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660］

Zhou Y Z， Sun X Y， Zha Z J and Zeng W J. 2019. Context-reinforced semantic segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4046-4055 ［DOI： 10.1109/CVPR.2019.00417http://dx.doi.org/10.1109/CVPR.2019.00417］

文章被引用时，请邮件提醒。

提交

融合上下文和注意力的海洋涡旋小目标检测

树形结构卷积神经网络优化的城区遥感图像语义分割

结合双边交叉增强与自注意力补偿的点云语义分割

面向无人机海岸带生态系统监测的语义分割基准数据集

基于深度学习的弱监督语义分割方法综述