融合局部与全局特征的DCE-MRI乳腺肿瘤良恶分类

赵小明; 廖越辉; 张石清; 方江雄; 何遐遐; 汪国余; 卢洪胜

doi:10.11834/jig.230092

医学图像处理 | 浏览量 : 0 下载量: 4 CSCD: 0

PDF
导出
分享
收藏
专辑

融合局部与全局特征的DCE-MRI乳腺肿瘤良恶分类
Method of classifying benign and malignant breast tumors by DCE-MRI incorporating local and global features
2024年29卷第1期页码：256-267
纸质出版日期： 2024-01-16 ，
DOI： 10.11834/jig.230092
稿件说明：

移动端阅览

赵小明，廖越辉，张石清，方江雄，何遐遐，汪国余，卢洪胜. 2024. 融合局部与全局特征的DCE-MRI乳腺肿瘤良恶分类. 中国图象图形学报， 29(01):0256-0267

Zhao Xiaoming， Liao Yuehui， Zhang Shiqing， Fang Jiangxiong， He Xiaxia， Wang Guoyu， Lu Hongsheng. 2024. Method of classifying benign and malignant breast tumors by DCE-MRI incorporating local and global features. Journal of Image and Graphics， 29(01):0256-0267
赵小明，廖越辉，张石清，方江雄，何遐遐，汪国余，卢洪胜. 2024. 融合局部与全局特征的DCE-MRI乳腺肿瘤良恶分类. 中国图象图形学报， 29(01):0256-0267 DOI： 10.11834/jig.230092.

Zhao Xiaoming， Liao Yuehui， Zhang Shiqing， Fang Jiangxiong， He Xiaxia， Wang Guoyu， Lu Hongsheng. 2024. Method of classifying benign and malignant breast tumors by DCE-MRI incorporating local and global features. Journal of Image and Graphics， 29(01):0256-0267 DOI： 10.11834/jig.230092.

摘要

目的

基于计算机辅助诊断的乳腺肿瘤动态对比增强磁共振成像（dynamic contrast-enhanced magnetic resonance imaging，DCE-MRI）检测和分类存在着准确度低、缺乏可用数据集等问题。

方法

针对这些问题，建立一个乳腺DCE-MRI影像数据集，并提出一种将面向局部特征学习的卷积神经网络（convolutional neural network， CNN）和全局特征学习的视觉Transformer（vision Transformer，ViT）方法相融合的局部—全局跨注意力融合网络（local global cross attention fusion network，LG-CAFN），用于实现乳腺肿瘤DCE-MRI影像自动诊断，以提高乳腺癌的诊断准确率和效率。该网络采用跨注意力机制方法，将CNN分支提取出的图像局部特征和ViT分支提取出的图像全局特征进行有效融合，从而获得更具判别性的图像特征用于乳腺肿瘤DCE-MRI影像良恶性分类。

结果

在乳腺癌DCE-MRI影像数据集上设置了两组包含不同种类的乳腺DCE-MRI 序列实验，并与VGG16（Visual Geometry Group 16-layer network）、深度残差网络（residual network，ResNet）、SENet（squeeze-and-excitation network）、ViT以及Swin-S （swin-Transformer-small）方法进行比较。同时，进行消融实验以及与其他方法的比较。两组实验结果表明，LG-CAFN在乳腺肿瘤良恶性分类任务上分别取得88.20%和83.93%的最高准确率（accuracy），其ROC（receiver operating characteristic）曲线下面积（area under the curve，AUC）分别达到0.915 4和0.882 6，均优于其他方法并最接近1。

结论

提出的LG-CAFN方法具有优异的局部—全局特征学习能力，可以有效提升DCE-MRI乳腺肿瘤影像良恶性分类性能。

Abstract

Objective

Among women in the United States， breast cancer （BC） is the most frequently detected type of cancer， except for nonmelanoma skin cancer， and it is the second-highest cause of cancer-related deaths in women， following lung cancer. Breast cancer cases have been on the rise in the past few years， but the number of deaths caused by breast cancer has either remained steady or decreased. This outcome could be due to improved early detection techniques and more effective treatment options. Magnetic resonance imaging （MRI）， especially dynamic contrast-enhanced （DCE）-MRI， has shown promising results in the screening of women with a high risk of breast cancer and in determining the stage of breast cancer in newly diagnosed patients. As a result， MRI， especially DCE-MRI， is becoming increasingly recognized as a valuable adjunct diagnostic tool for the timely detection of breast cancer. With the development of artificial intelligence， many deep learning models based on convolutional neural network （CNN） have been widely used in medical image analysis such as VGG and ResNet. These models can automatically extract deep features from images， eliminating the need for hand-crafted feature extraction and saving much time and effort. However， CNN cannot obtain global information， and global information of medical images is very useful for the diagnosis of breast tumors. To acquire global information， the vision Transformer （ViT） has been proposed and achieved magnificent results in computer vision tasks. ViT uses convolution operation to separate the entire input image into many small image patches. Then， ViT can simultaneously process these image patches by multihead self-attention layers and capture global information in different regions of the entire input image. However， ViT inevitably loses local information while capturing global information. To integrate the advantages of CNN and ViT， studies have been proposed to combine the advantages of CNN and ViT to obtain more comprehensive feature representations for achieving better performance in breast tumor diagnosis tasks.

Method

Based on the above observations and inspired by integrating the CNN and ViT， a novel cross-attention fusion network is proposed based on CNN and ViT， which can simultaneously extract local detail information from CNN and global information from ViT. Then， a nonlocal block is used to fuse this information to classify breast tumor DCE-MR images. The model structure mainly contains three parts： local CNN and global ViT branches， feature coupling unit （FCU）， and cross-attention fusion. The CNN subnetwork uses SENet for capturing local information， and the ViT subnetwork captures global information. For the extracted feature maps from these two branches， their feature dimensions are usually different. To address this issue， an FCU is adopted to eliminate feature dimension misalignment between these two branches. Finally， the nonlocal block is used to compute the correspondences on the two different inputs. The former two stages （stage-1 and stage-2） of SENet50 as our local CNN subnetwork and a 7-layer ViT （ViT-7） as our global subnetwork are adopted. Each stage in SENet50 is composed of some residual blocks and SEblocks. Each residual block contains a

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=52792211&type=

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=52792207&type=

7.11199999

2.28600001

convolution layer， a

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=52792202&type=

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=52792196&type=

7.11199999

2.28600001

convolution layer， and a

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=52792211&type=

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=52792207&type=

7.11199999

2.28600001

convolution layer. Each SEblocks contains a global average pooling layer， two fully connected （FC） layers， and a sigmoid activation function. Here， it is separately set to be 3 in stage-1 and 4 in stage-2 for the number of residual blocks and SEblocks. The 7-layer ViT contains seven encoder layers， which include two LayerNorms， a multihead self-attention module， and a simple multi-layer perception （MLP） block. The FCU contains a 1 × 1 convolution， a BatchNorm layer， and a nearest neighbor Interpolation. The nonlocal block consists of four 1 × 1 convolutions and a softmax function.

Result

The model performance is compared with the five other deep learning models such as VGG16， ResNet50， SENet50， ViT and Swin-S （swin-Transformer-small）， and two sets of experiments that use different breast tumor DCE-MRI sequences are conducted to evaluate the robustness and generalization of the model. The quantitative evaluation metrics contain accuracy and area under the receiver operating characteristic （ROC） curve （AUC）. Compared with VGG16 and ResNet50 in two sets of experiments， the accuracy increases by 3.7%， 3.6% and AUC increases by 0.045， 0.035 on average， respectively. Compared with SENet50 and ViT-7 in two sets of experiments， the accuracy increases by 3.2% and 1.1%， and AUC increases by 0.035 and 0.025 on average， respectively. Compared with Swin-S in two sets of experiments， the accuracy increases by 3.0% and 2.6%， and AUC increases by 0.05 and 0.03. In addition， the class activation map of learned feature representations of models is generated to increase the interpretability of the models. Finally， a series of ablation experiments is conducted to prove the effectiveness of our proposed method. Specially， different fusion methods such as feature- and decision-level fusion are compared with our cross-attention fusion module. Compared with the feature-level fusion method in two sets of experiments， the accuracy increases by 1.6% and 1.3%， and AUC increases by 0.03 and 0.02. Compared with the decision-level fusion method in two sets of experiments， the accuracy increases by 0.7% and 1.8%， and AUC increases by 0.02 and 0.04. In the end， comparative experiments with three recent methods such as RegNet， ConvNext， and MobileViT are also performed. Experimental results fully demonstrate the effectiveness of our method in the breast tumor DCE-MR image classification task.

Conclusion

In this paper， a novel cross-attention fusion network based on local CNN and global ViT （LG-CAFN） is proposed for the benign and malignant tumor classification of breast DCE-MR images. Extensive experiments demonstrate the superior performance of our method compared with several state-of-the-art methods. Although the LG-CAFN model is only used for the diagnosis of breast tumor DCE-MR images， this approach can be very easily transferred to other medical image diagnostic tasks. Therefore， in future work， our approach will be extended to other medical image diagnostic tasks， such as breast ultrasound images and breast CT images. In addition， automatic segmentation tasks for breast DCE-MR images will be explored to analyze breast DCE-MR images more comprehensively and to help radiologists make more accurate diagnoses.

关键词

乳腺肿瘤动态对比增强磁共振成像（DCE-MRI）视觉Transformer （ViT）卷积神经网络（CNN）注意力融合

Keywords

breast tumordynamic contrast-enhanced magnetic resonance imaging （DCE-MRI）vision Transformer （ViT）convolutional neural network （CNN）attention fusion

references

Chen H Y， Gao J Y， Zhao D， Wang H Z， Song H and Su Q H. 2021. Review of the research progress in deep learning and biomedical image analysis till 2020. Journal of image and Graphics， 26（3）： 475-486

陈弘扬，高敬阳，赵地，汪红志，宋红，苏庆华. 2021. 深度学习与生物医学图像分析2020年综述. 中国图象图形学报， 26（3）： 475-486 ［DOI： 10.11834/jig.200351http://dx.doi.org/10.11834/jig.200351］

Chen X X， Zhang K， Abdoli N， Gilley P W， Wang X M， Liu H， Zheng B and Qiu Y C. 2022. Transformers improve breast cancer diagnosis from unregistered multi-view mammograms. Diagnostics， 12（7）： #1549 ［DOI： 10.3390/diagnostics12071549http://dx.doi.org/10.3390/diagnostics12071549］

DeSantis C E， Ma J M， Gaudet M M， Newman L A， Miller K D， Goding Sauer A， Jemal A and Siegel R L. 2019. Breast cancer statistics， 2019. CA： A Cancer Journal for Clinicians， 69（6）： 438-451 ［DOI： 10.3322/caac.21583http://dx.doi.org/10.3322/caac.21583］

Dosovitskiy A， Beyer L， Kolesnikov A， Weissenborn D， Zhai X H， Unterthiner T， Dehghani M， Minderer M， Heigold G， Gelly S， Uszkoreit J and Houlsby N. 2021. An image is worth 16 × 16 words： Transformers for image recognition at scale ［EB/OL］. ［2022-10-04］. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf

Gheflati B and Rivaz H. 2021. Vision Transformers for classification of breast ultrasound images//Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Glasgow， United Kingdom： IEEE： 480-483 ［DOI： 10.1109/EMBC48229.2022.9871809http://dx.doi.org/10.1109/EMBC48229.2022.9871809］

Ha R， Mutasa S， Karcich J， Gupta N， Pascual Van Sant E， Nemer J， Sun M， Chang P， Liu M Z and Jambawalikar S. 2019. Predicting breast cancer molecular subtype with MRI dataset utilizing convolutional neural network algorithm. Journal of Digital Imaging， 32（2）： 276-282 ［DOI： 10.1007/s10278-019-00179-2http://dx.doi.org/10.1007/s10278-019-00179-2］

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 770-778 ［DOI： 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90］

Hu J， Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7132-7141 ［DOI： 10.1109/CVPR.2018.00745http://dx.doi.org/10.1109/CVPR.2018.00745］

Jiang N and Wang L. 2015. Quantum image scaling using nearest neighbor interpolation. Quantum Information Processing， 14（5）： 1559-1571 ［DOI： 10.1007/s11128-014-0841-8http://dx.doi.org/10.1007/s11128-014-0841-8］

Khandelwal U， Fan A， Jurafsky D， Zettlemoyer L and Lewis M. 2021. Nearest neighbor machine translation ［EB/OL］. ［2022-10-04］. https://arxiv.org/pdf/2010.00710.pdfhttps://arxiv.org/pdf/2010.00710.pdf

Krizhevsky A， Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM， 60（6）： 84-90 ［DOI： 10.1145/3065386http://dx.doi.org/10.1145/3065386］

LeCun Y， Bengio Y and Hinton G. 2015. Deep learning. Nature， 521（7553）： 436-444 ［DOI： 10.1038/nature14539http://dx.doi.org/10.1038/nature14539］

Liu Y L， Gao Y and Yin W T. 2020. An improved analysis of stochastic gradient descent with momentum//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 18261-18271

Liu Z， Lin Y T， Cao Y， Hu H， Wei Y X， Zhang Z， Lin S and Guo B N. 2021. Swin Transformer： hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 9992-10002 ［DOI： 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986］

Liu Z， Mao H Z， Wu C Y， Feichtenhofer C， Darrell T and Xie S N. 2022. A convnet for the 2020s//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 11966-11976 ［DOI： 10.1109/CVPR52688.2022.01167http://dx.doi.org/10.1109/CVPR52688.2022.01167］

Mahmood T， Li J Q， Pei Y， Akhtar F， Imran A and Rehman K U. 2020. A brief survey on breast cancer diagnostic with deep learning schemes using multi-image modalities. IEEE Access， 8： 165779-165809 ［DOI： 10.1109/Access.2020.3021343http://dx.doi.org/10.1109/Access.2020.3021343］

Maicas G， Bradley A P， Nascimento J C， Reid I and Carneiro G. 2019. Pre and post-hoc diagnosis and interpretation of malignancy from breast DCE-MRI. Medical Image Analysis， 58： #101562 ［DOI： 10.1016/j.media.2019.101562http://dx.doi.org/10.1016/j.media.2019.101562］

Mehta S and Rastegari M. 2022. Mobilevit： light-weight， general-purpose， and mobile-friendly vision Transformer ［EB/OL］. ［2022-10-04］. https://arxiv.org/pdf/2110.02178.pdfhttps://arxiv.org/pdf/2110.02178.pdf

Paszke A， Gross S， Massa F， Lerer A， Bradbury J， Chanan G， Killeen T， Lin Z M， Gimelshein N， Antiga L， Desmaison A， Köpf A， Yang E， DeVito Z， Raison M， Tejani A， Chilamkurthy S， Steiner B， Fang L， Bai J J and Chintala S. 2019. Pytorch： an imperative style， high-performance deep learning library//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates， Inc.： 8026-8037

Peng Z L， Huang W， Gu S Z， Xie L X， Wang Y W， Jiao J B and Ye Q X. 2021. Conformer： local features coupling global representations for visual recognition//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 357-366 ［DOI： 10.1109/ICCV48922.2021.00042http://dx.doi.org/10.1109/ICCV48922.2021.00042］

Radosavovic I， Kosaraju R P， Girshick R， He K M and Doll􀅡r P. 2020. Designing network design spaces//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 10425-10433 ［DOI： 10.1109/CVPR42600.2020.01044http://dx.doi.org/10.1109/CVPR42600.2020.01044］

Salama W M and Aly M H. 2021. Deep learning in mammography images segmentation and classification： automated CNN approach. Alexandria Engineering Journal， 60（5）： 4701-4709 ［DOI： 10.1016/j.aej.2021.03.048http://dx.doi.org/10.1016/j.aej.2021.03.048］

Selvaraju R R， Cogswell M， Das A， Vedantam R， Parikh D and Batra D. 2017. Grad-CAM： visual explanations from deep networks via gradient-based localization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 618-626 ［DOI： 10.1109/ICCV.2017.74http://dx.doi.org/10.1109/ICCV.2017.74］

Shi J， Wang L L， Wang S S， Chen Y X， Wang Q， Wei D M， Liang S J， Peng J L， Yi J J， Liu S F， Ni D， Wang M L， Zhang D Q and Shen D G. 2020. Applications of deep learning in medical imaging： a survey. Journal of image and Graphics， 25（10）： 1953-1981

施俊，汪琳琳，王珊珊，陈艳霞，王乾，魏冬铭，梁淑君，彭佳林，易佳锦，刘盛锋，倪东，王明亮，张道强，沈定刚. 2020. 深度学习在医学影像中的应用综述. 中国图象图形学报， 25（10）： 1953-1981 ［DOI： 10.11834/jig.200255http://dx.doi.org/10.11834/jig.200255］

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2022-10-04］. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Vaswani A， Shazeer N， Parmar N， Uszkoreit J， Jones L， Gomez A N， Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 6000-6010

Wang C M， Mai X X， Lin G C and Kuo C T. 2008. Classification for breast MRI using support vector machine//Proceedings of the 8th IEEE International Conference on Computer and Information Technology Workshops. Sydney， Australia： IEEE： 362-367 ［DOI： 10.1109/CIT.2008.Workshops.90http://dx.doi.org/10.1109/CIT.2008.Workshops.90］

Wang X L， Girshick R， Gupta A and He K M. 2018. Non-local neural networks//Proceedings of 2018 IEEE/ACM Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7794-7803 ［DOI： 10.1109/CVPR.2018.00813http://dx.doi.org/10.1109/CVPR.2018.00813］

Xing S X， Ju Z H， Liu Z J， Wang Y and Fan F Q. 2023. Multi-label classification of chest X-ray images with pre-trained vision Transformer model. Journal of image and Graphics， 28（4）： 1186-1197

邢素霞，鞠子涵，刘子骄，王瑜，范福强. 2023. 视觉Transformer预训练模型的胸腔X线影像多标签分类. 中国图象图形学报， 28（4）： 1186-1197 ［DOI： 10.11834/jig.220284http://dx.doi.org/10.11834/jig.220284］

Zhang J， Saha A， Zhu Z and Mazurowski M A. 2019. Hierarchical convolutional neural networks for segmentation of breast tumors in MRI with application to radiogenomics. IEEE Transactions on Medical Imaging， 38（2）： 435-447 ［DOI： 10.1109/TMI.2018.2865671http://dx.doi.org/10.1109/TMI.2018.2865671］

文章被引用时，请邮件提醒。

提交

面向高光谱场景分类的空—谱模型蒸馏网络

联合深度学习和宽度学习的纹理样图自动提取

自注意力融合调制的弱监督语义分割

通道注意力嵌入的Transformer图像超分辨率重构

深度学习背景下的图像语义分割方法综述