结合潜在扩散模型和U型网络的HIFU治疗目标区域提取

翟锦涛; 王润民; 李昂; 田峰; 龚瑾儒; 钱盛友; 邹孝

doi:10.11834/jig.230516

图像/视频语义分割 | 浏览量 : 0 下载量: 9 CSCD: 0

PDF
导出
分享
收藏
专辑

结合潜在扩散模型和U型网络的HIFU治疗目标区域提取
Combination of latent diffusion and U-shaped networks for HIFU treatment target region extraction
2024年29卷第5期页码：1291-1306
纸质出版日期： 2024-05-16 ，
DOI： 10.11834/jig.230516
稿件说明：

移动端阅览

翟锦涛，王润民，李昂，田峰，龚瑾儒，钱盛友，邹孝. 2024. 结合潜在扩散模型和U型网络的HIFU治疗目标区域提取. 中国图象图形学报， 29(05):1291-1306

Zhai Jintao， Wang Runmin， Li Ang， Tian Feng， Gong Jinru， Qian Shengyou， Zou Xiao. 2024. Combination of latent diffusion and U-shaped networks for HIFU treatment target region extraction. Journal of Image and Graphics， 29(05):1291-1306
翟锦涛，王润民，李昂，田峰，龚瑾儒，钱盛友，邹孝. 2024. 结合潜在扩散模型和U型网络的HIFU治疗目标区域提取. 中国图象图形学报， 29(05):1291-1306 DOI： 10.11834/jig.230516.

Zhai Jintao， Wang Runmin， Li Ang， Tian Feng， Gong Jinru， Qian Shengyou， Zou Xiao. 2024. Combination of latent diffusion and U-shaped networks for HIFU treatment target region extraction. Journal of Image and Graphics， 29(05):1291-1306 DOI： 10.11834/jig.230516.

摘要

目的

由于数据采集限制和隐私保护造成高强度聚焦超声（high intensity focused ultrasound， HIFU）治疗超声监控图像数据量过少，导致现有的强监督分割方法提取治疗目标区域不佳。因此，提出了一种结合潜在扩散模型（latent diffusion）和U型网络的HIFU治疗目标区域提取方法。

方法

生成阶段利用潜在扩散模型和自动筛选模块，实现超声监控图像数据的扩充。目标区域提取阶段提出新型U型分割网络（novel U-shaped segmentation network， NUNet），在编码器端结合空洞空间金字塔池化（atrous spatial pyramid pooling， ASPP），扩大网络的感受野；设计双注意力跳跃连接模块（dual attention skip connection，DAttention-SK），降低边缘纹理信息丢失的风险；引入多交叉熵损失提高网络的分割性能。

结果

实验结果表明，与其他生成模型相比，本文使用潜在扩散模型生成的超声监控图像在FID（Fréchet inception distance）和LPIPS（learned perceptual image patch similarity）上获得更优的指标（分别为0.172和0.072）；相较于先进的PDF-UNet（U-shaped pyramid-dilated network），在HIFU临床治疗子宫肌瘤超声监控数据集中，本文分割算法的MIoU（mean intersection over union）和DSC（Dice similarity coefficient）分别提高了2.67%和1.39%。为进一步探讨所提算法的泛化性，在乳腺超声公共数据集（breast ultrasound images dataset，BUSI）上进行了验证。相较于M

SNet（multi-scale in multi-scale subtraction network），本文算法MIoU和DSC分别提升了2.11%和1.36%。

结论

本文算法在一定程度上解决了超声监控图像中数据量过少的问题，实现对监控超声图像中目标区域的精确提取。代码开源地址为

https://github.com/425877/based-on-latent-diffusion-model-for-HIFU-treatment-target-region-extraction

。

Abstract

Objective

In high intensity focused ultrasound （HIFU） treatment， the target area contains a large amount of pathological information； thus， the target area must be accurately located and extracted by ultrasound monitoring images. As biological tissues and target regions change their relative positions during treatment， the location of the treatment area may also change. At the same time， the diversity of diseases， the variability of tissues， and the complexity of target shapes pose certain challenges for target region extraction in ultrasound medical images. Nevertheless， computers can use advanced image processing and analysis algorithms， combined with big data and machine learning methods， to identify and locate target areas quickly and accurately， providing a reliable basis for quantitative clinical analysis. Traditional image segmentation algorithms mainly include methods， such as threshold segmentation， edge detection， and region growing. However， these methods still have some limitations and are sensitive to the complexity of ultrasound images， noise， and other image quality issues， resulting in poor accuracy and robustness of segmentation results. Meanwhile， traditional methods usually require manual selection of parameters， which limit the adaptive and generalization capabilities of the methods， and have a strong dependence on different images. In recent years， deep learning-based methods have attracted widespread attention and made remarkable progress in the field of medical image segmentation. Most of the methods are performed under strong supervision， yet this type of training requires a large amount of data as support for improved prediction. The amount of data in HIFU therapy ultrasound surveillance images is too small due to patient privacy， differences in acquisition devices， and the need for manual labeling of target areas by specialized physicians. It causes the network not to be adequately trained， making the segmentation results poor in accuracy and robustness. Therefore， this study proposed a method for extracting the target region of HIFU treatment by combining the latent diffusion and U-shaped network.

Method

First， we train latent diffusion using existing ultrasound surveillance images and their masks， in which the masks are input into the model as condition vectors to generate ultrasound surveillance images with the same contours. To ensure further that the quality of the generated images is close to that of the original images， we design an automatic filtering module that calculates the Fréchet inception distance score （FID） of the generated images with respect to the original images by setting the threshold value of the FID to achieve the reliability of the data expansion of ultrasound surveillance images. Second， we propose a novel U-shaped segmentation network （NUNet）， whose main body adopts the encoder and decoder of U-Net. Combining atrous spatial pyramid pooling （ASPP） on the encoder side expands the sensory field of the network to extract image features more efficiently. Inspired by the spatial attention and channel attention mechanisms， we design the dual attention skip connection module （DAttention-SK） to replace the original skip connection layer， which improves the efficiency of splicing low-level information with high-level information and reduces the risk of losing information， such as edge texture. At the same time， incorporating multiple cross entropy losses supervises the network to retain useful details and contextual information. Finally， the images generated using latent diffusion are combined with the existing ultrasound surveillance images as a training set. The effect of segmentation errors due to data scarcity in ultrasound surveillance images is reduced to improve the accuracy of segmentation further.

Result

All experiments were implemented in PyTorch on NVIDIA GeForce RTX 3080 GPU. We trained latent diffusion using datasets collected from clinical treatments and determine the quality of the generated images by FID. For the training strategy of the generative network， the initial learning rate was set to 1 × 10

-4

， the batch size was adjusted to 2， and the training epoch was 200. When training the segmentation network， the initial learning rate was set to 1×10

-4

， the batch size was adjusted to 24， and the training epoch was 100. To verify the superiority of the proposed method， we compared the popular generative and segmentati

on models. Experimental results showed that the ultrasound surveillance images generated using latent diffusion in exhibit better metrics on FID and learned perceptual image patch similarity （LPIPS） compared with other generative models （0.172 and 0.072， respectively）. Under the training set of ultrasound surveillance images of uterine fibroids clinically treated with HIFU， the proposed segmentation algorithm obtained an improvement in mean intersection over union （MIoU） and Dice similarity coefficient （DSC） by 2.67% and 1.39%， respectively， compared with the state-of-the-art PDF-UNet. Validation was continued in a breast ultrasound image dataset to explore the generalization of the proposed algorithm. Compared with the state-of-the-art M

SNet， the proposed algorithm’s MIoU and DSC are improved by 2.11% and 1.36%， respectively.

Conclusion

A method for extracting the target region of HIFU treatment was proposed by combining latent diffusion and a U-shaped network. For the first time， latent diffusion was introduced into the generation of ultrasound surveillance images for HIFU treatment， solving the problems of insufficient dataset diversity and data scarcity. Combining ASPP and dual-attention skip connection module in the segmentation network reduces the risk of losing information， such as the edge texture of the target region， and achieves accurate extraction of the target region in the surveillance ultrasound image. The proposed algorithm solves the problem of insufficient diversity of datasets in surveillance ultrasound images to a certain extent and realizes the accurate extraction of target regions in surveillance ultrasound images.

关键词

高强度聚焦超声（HIFU）图像分割图像生成损失函数潜在扩散模型

Keywords

high intensity focused ultrasound （HIFU）image segmentationimage generationloss functionlatent-diffusion

references

Alom M Z， Yakopcic C， Hasan M， Taha T M and Asari V K. 2019. Recurrent residual U-Net for medical image segmentation. Journal of Medical Imaging， 6（1）： #014006 ［DOI： 10.1117/1.JMI.6.1.014006http://dx.doi.org/10.1117/1.JMI.6.1.014006］

Bargsten L and Schlaefer A. 2020. SpeckleGAN： a generative adversarial network with an adaptive speckle layer to augment limited training data for ultrasound image processing. International Journal of Computer Assisted Radiology and Surgery， 15（9）： 1427-1436 ［DOI： 10.1007/s11548-020-02203-1http://dx.doi.org/10.1007/s11548-020-02203-1］

Cao H， Wang Y Y， Chen J， Jiang D S， Zhang X P， Tian Q and Wang M N. 2022. Swin-Unet： Unet-like pure Transformer for medical image segmentati-on//Proceedings of European Conference on Computer Vision. Tel Aviv， Israel： Springer： 205-218 ［DOI： 10.1007/978-3-031-25066-8_9http://dx.doi.org/10.1007/978-3-031-25066-8_9］

Chen J N， Lu Y Y， Yu Q H， Luo X D， Adeli E， Wang Y， Lu L， Yuille A L and Zhou Y Y. 2021. TransUNet： Transformers make strong encoders for medical image segmentation ［EB/OL］. ［2023-07-31］. https://arxiv.org/pdf/2102.04306.pdfhttps://arxiv.org/pdf/2102.04306.pdf

Chen L C， Zhu Y， Papandreou G， Schroff F and Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 801-818 ［DOI： 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49］

Dhariwal P and Nichol A Q. 2021. Diffusion models beat GANs on image synthesis//Proceedings of the 35th Conference on Neural Information Processing Systems. OpenReview.net： 8780-8794

Gao G Q and Ogawara K. 2020. CGAN-based synthetic medical image augmentation between retinal fundus images and vessel segmented images//Proceedings of the 5th International Conference on Control and Robotics Engineering （ICCRE）. Osaka， Japan： IEEE： 218-223 ［DOI： 10.1109/ICCRE49379.2020.9096438http://dx.doi.org/10.1109/ICCRE49379.2020.9096438］

Gu Z W， Cheng J， Fu H Z， Zhou K， Hao H Y， Zhao Y T， Zhang T Y， Gao S H and Liu J. 2019. CE-Net： context encoder network for 2D medical image segmentation. IEEE Transactions on Medical Imaging， 38（10）： 2281-2292 ［DOI： 10.1109/TMI.2019.2903562http://dx.doi.org/10.1109/TMI.2019.2903562］

He H and Chen S. 2020. Automatic tumor segmentation in PET by deep convolutional U-Net with pretrained encoder. Journal of Image and Graphics， 25（1）： 171-179

何慧，陈胜. 2020. 改进预训练编码器U-Net模型的PET肿瘤自动分割. 中国图象图形学报， 25（1）： 171-179 ［DOI： 10.11834/jig.190058http://dx.doi.org/10.11834/jig.190058］

Hu J， Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7132-7141 ［DOI： 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745］

Iqbal A and Sharif M. 2023. PDF-UNet： a semi-supervised method for segmentation of breast tumor images using a U-shaped pyramid-dilated network. Expert Systems with Applications， 221： #119718 ［DOI： 10.1016/j.eswa.2023.119718http://dx.doi.org/10.1016/j.eswa.2023.119718］

Jin Q G， Meng Z P， Pham T D， Chen Q， Wei L Y and Su R. 2019. DUNet： a deformable network for retinal vessel segmentation. Knowledge-Based Systems， 178： 149-162 ［DOI： 10.1016/j.knosys.2019.04.025http://dx.doi.org/10.1016/j.knosys.2019.04.025］

Jin S Z， Yu S， Peng J， Wang H Y and Zhao Y. 2023. A novel medical image segmentation approach by using multi-branch segmentation network based on local and global information synchronous learning. Scientific Reports， 13（1）： #6762 ［DOI： 10.1038/s41598-023-33357-yhttp://dx.doi.org/10.1038/s41598-023-33357-y］

Karras T， Laine S， Aittala M， Hellsten J， Lehtinen J and Aila T. 2020. Analyzing and improving the image quality of styleGAN//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 8107-8116 ［DOI： 10.1109/CVPR42600.2020.00813http://dx.doi.org/10.1109/CVPR42600.2020.00813］

Liu Y， Wu R R， Tang L and Song N N. 2022. U-Net-based mediastinal lymph node segmentation method in bronchial ultrasound elastic images. Journal of Image and Graphics， 27（10）： 3082-3091

刘羽，吴蓉蓉，唐璐，宋宁宁. 2022. U-Net支气管超声弹性图像纵膈淋巴结分割. 中国图象图形学报， 27（10）： 3082-3091 ［DOI： 10.11834/jig.210225http://dx.doi.org/10.11834/jig.210225］

Mach􀅡ček R， Mozaffari L， Sepasdar Z， Parasa S， Halvorsen P， Riegler M A and Thambawita V. 2023. Mask-conditioned latent diffusion for generating gastrointestinal polyp images//Proceedings of the 4th ACM Workshop on Intelligent Cross-Data Analysis and Retrieval. Thessaloniki， Greece： ACM： #3592978 ［DOI： 10.1145/3592571.3592978http://dx.doi.org/10.1145/3592571.3592978］

Middel L， Palm C and Erdt M. 2019. Synthesis of medical images using GANs//Proceedings of the 1st Workshop on Clinical Image-Based Procedures. Shenzhen， China： Springer： 125-134 ［DOI： 10.1007/978-3-030-32689-0_13http://dx.doi.org/10.1007/978-3-030-32689-0_13］

Oktay O， Schlemper J， Folgoc L L， Lee M， Heinrich M， Misawa K， Mori K， Mcdonagh S， Hammerla N Y， Kainz B， Glocker B and Rueckert D. 2018. Attention U-Net： learning where to look for the pancreas ［EB/OL］. ［2023-07-31］. https://arxiv.org/pdf/1804.03999.pdfhttps://arxiv.org/pdf/1804.03999.pdf

Pinaya W H， Tudosiu P D， Dafflon J， Da Costa P F， Fernandez V， Nachev P， Ourselin S and Cardoso M J. 2022. Brain imaging generation with latent diffusion models///Proceedings of the 2nd MICCAI Workshop on Deep Generative Models. Singapore： Springer： 117-126 ［DOI： 10.1007/978-3-031-18576-2_12http://dx.doi.org/10.1007/978-3-031-18576-2_12］

Rombach R， Blattmann A， Lorenz D， Esser P and Ommer B. 2022. High-resolution image synthesis with latent diffusion models//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 10674-10685 ［DOI： 10.1109/CVPR52688.2022.01042http://dx.doi.org/10.1109/CVPR52688.2022.01042］

Ronneberger O， Fischer P and Brox T. 2015. U-Net： convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich， Germany： Springer： 234-241 ［DOI： 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28］

Sun Y， Zhou C F， Fu Y W and Xue X Y. 2019. Parasitic GAN for semi-supervised brain tumor segmentation//Proceedings of 2019 IEEE International Conference on Image Processing （ICIP）. Taipei， China： IEEE： 1535-1539 ［DOI： 10.1109/ICIP.2019.8803073http://dx.doi.org/10.1109/ICIP.2019.8803073］

Wang H N， Cao P， Wang J Q and Zaiane O R. 2022. UCTransNet： rethinking the skip connections in U-Net from a channel-wise perspective with Transformer//Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington， USA： AAAI： 2441-2449 ［DOI： 10.1609/aaai.v36i3.20144http://dx.doi.org/10.1609/aaai.v36i3.20144］

Wang Q L， Wu B G， Zhu P F， Li P H， Zuo W M and Hu Q H. 2020. ECA-Net： efficient channel attention for deep convolutional neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 11531-11539 ［DOI： 10.1109/CVPR42600.2020.01155http://dx.doi.org/10.1109/CVPR42600.2020.01155］

Woo S， Park J， Lee J Y and Kweon I S. 2018. CBAM： convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 3-19 ［DOI： 10.1007/978-3-030-01234-2_1http://dx.doi.org/10.1007/978-3-030-01234-2_1］

Yang Y， Pan X D and Zhuang J. 2015. An adaptive threshold setting method for image processing in ultrasonic testing. Journal of Xi'an Jiaotong University， 49（1）： 127-132

杨晔，潘希德，庄健. 2015. 一种针对超声检测图像的自适应阈值设置方法. 西安交通大学学报， 49（1）： 127-132 ［DOI： 10.7652/xjtuxb201501021http://dx.doi.org/10.7652/xjtuxb201501021］

Zhao H S， Zhang Y， Liu S， Shi J P， Loy C C， Lin D H and Jia J Y. 2018. PSANet： point-wise spatial attention network for scene parsing//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 267-283 ［DOI： 10.1007/978-3-030-01240-3_17http://dx.doi.org/10.1007/978-3-030-01240-3_17］

Zhao X Q， Jia H P， Pang Y W， Lyu L， Tian F， Zhang L H， Sun W B and Lu H C. 2023. M2SNet： multi-scale in multi-scale subtraction network for medical image segmentation ［EB/OL］. ［2023-07-31］. https://arxiv.org/pdf/2303.10894.pdfhttps://arxiv.org/pdf/2303.10894.pdf

Zhou Z W， Siddiquee M M R， Tajbakhsh N and Liang J M. 2020. UNet++： redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging， 39（6）： 1856-1867 ［DOI： 10.1109/TMI.2019.2959609http://dx.doi.org/10.1109/TMI.2019.2959609］

Zhu W T， Xiang X， Tran T D， Hager G D and Xie X H. 2018. Adversarial deep structured nets for mass segmentation from mammograms//Proceedings of the 15th IEEE International Symposium on Biomedical Imaging. Washington， USA： IEEE： 847-850 ［DOI： 10.1109/ISBI.2018.8363704http://dx.doi.org/10.1109/ISBI.2018.8363704］

文章被引用时，请邮件提醒。

提交