Rui Wang , Xiaoshuang Shi , Shuting Pang , Yidi Chen , Xiaofeng Zhu , Wentao Wang , Jiabin Cai , Danjun Song , Kang Li
{"title":"Cross-attention guided loss-based deep dual-branch fusion network for liver tumor classification","authors":"Rui Wang , Xiaoshuang Shi , Shuting Pang , Yidi Chen , Xiaofeng Zhu , Wentao Wang , Jiabin Cai , Danjun Song , Kang Li","doi":"10.1016/j.inffus.2024.102713","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, convolutional neural networks (CNNs) and multiple instance learning (MIL) methods have been successfully applied to MRI images. However, CNNs directly utilize the whole image as the model input and the downsampling strategy (like max or mean pooling) to reduce the size of the feature map, thereby possibly neglecting some local details. And MIL methods learn instance-level or local features without considering spatial information. To overcome these issues, in this paper, we propose a novel cross-attention guided loss-based dual-branch framework (LCA-DB) to leverage spatial and local image information simultaneously, which is composed of an image-based attention network (IA-Net), a patch-based attention network (PA-Net) and a cross-attention module (CA). Specifically, IA-Net directly learns image features with loss-based attention to mine significant regions, meanwhile, PA-Net captures patch-specific representations to extract crucial patches related to the tumor. Additionally, the cross-attention module is designed to integrate patch-level features by using attention weights generated from each other, thereby assisting them in mining supplement region information and enhancing the interactive collaboration of the two branches. Moreover, we employ an attention similarity loss to further reduce the semantic inconsistency of attention weights obtained from the two branches. Finally, extensive experiments on three liver tumor classification tasks demonstrate the effectiveness of the proposed framework, e.g., on the LLD-MMRI–7, our method achieves 69.2%, 65.9% and 88.5% on the seven-class liver tumor classification tasks in terms of accuracy, F<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> score and AUC, with the superior classification and interpretation performance over recent state-of-the-art methods. The source code of LCA-DB is available at <span><span>https://github.com/Wangrui-berry/Cross-attention</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102713"},"PeriodicalIF":14.7000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524004913","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, convolutional neural networks (CNNs) and multiple instance learning (MIL) methods have been successfully applied to MRI images. However, CNNs directly utilize the whole image as the model input and the downsampling strategy (like max or mean pooling) to reduce the size of the feature map, thereby possibly neglecting some local details. And MIL methods learn instance-level or local features without considering spatial information. To overcome these issues, in this paper, we propose a novel cross-attention guided loss-based dual-branch framework (LCA-DB) to leverage spatial and local image information simultaneously, which is composed of an image-based attention network (IA-Net), a patch-based attention network (PA-Net) and a cross-attention module (CA). Specifically, IA-Net directly learns image features with loss-based attention to mine significant regions, meanwhile, PA-Net captures patch-specific representations to extract crucial patches related to the tumor. Additionally, the cross-attention module is designed to integrate patch-level features by using attention weights generated from each other, thereby assisting them in mining supplement region information and enhancing the interactive collaboration of the two branches. Moreover, we employ an attention similarity loss to further reduce the semantic inconsistency of attention weights obtained from the two branches. Finally, extensive experiments on three liver tumor classification tasks demonstrate the effectiveness of the proposed framework, e.g., on the LLD-MMRI–7, our method achieves 69.2%, 65.9% and 88.5% on the seven-class liver tumor classification tasks in terms of accuracy, F score and AUC, with the superior classification and interpretation performance over recent state-of-the-art methods. The source code of LCA-DB is available at https://github.com/Wangrui-berry/Cross-attention.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.