{"title":"Bilateral Network with Text Guided Aggregation Architecture for Lung Infection Image Segmentation.","authors":"Xiang Pan, Hanxiao Mei, Jianwei Zheng, Herong Zheng","doi":"10.1088/2057-1976/adb290","DOIUrl":null,"url":null,"abstract":"<p><p>Lung image segmentation is a crucial problem for autonomous understanding of the potential illness. However, existing approaches lead to a considerable decrease in accuracy for lung infection areas with varied shapes and sizes. Recently, researchers aimed to improve segmentation accuracy by combining diagnostic reports based on text prompts and image vision information. However, limited by the network structure, these methods are inefficient and ineffective. To address this issue, this paper proposes a Bilateral Network with Text Guided Aggregation Architecture (BNTGAA) to fully fuse local and global information for text and image vision. This proposed architecture involves (i) a global fusion branch with a Hadamard product to align text and vision feature representation and (ii) a multi-scale cross-fusion branch with positional coding and skip connection, performing text-guided segmentation in different resolutions. (iii) The global fusion and multi-scale cross-fusion branches are combined to feed a mamba module for efficient segmentation. Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture performs better both in accuracy and efficiency. Our architecture outperforms the current best methods on the QaTa-COVID19 dataset, improving mIoU and Dice scores by 3.08\\% and 2.35\\%, respectively. Meanwhile, our architecture surpasses the computational speed of existing multimodal networks. Finally, the architecture has a quick convergence and generality. It can exceed the performance of the current best methods even if it is trained with only 50\\% of the dataset.</p>","PeriodicalId":8896,"journal":{"name":"Biomedical Physics & Engineering Express","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Physics & Engineering Express","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2057-1976/adb290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Lung image segmentation is a crucial problem for autonomous understanding of the potential illness. However, existing approaches lead to a considerable decrease in accuracy for lung infection areas with varied shapes and sizes. Recently, researchers aimed to improve segmentation accuracy by combining diagnostic reports based on text prompts and image vision information. However, limited by the network structure, these methods are inefficient and ineffective. To address this issue, this paper proposes a Bilateral Network with Text Guided Aggregation Architecture (BNTGAA) to fully fuse local and global information for text and image vision. This proposed architecture involves (i) a global fusion branch with a Hadamard product to align text and vision feature representation and (ii) a multi-scale cross-fusion branch with positional coding and skip connection, performing text-guided segmentation in different resolutions. (iii) The global fusion and multi-scale cross-fusion branches are combined to feed a mamba module for efficient segmentation. Extensive quantitative and qualitative evaluations demonstrate that the proposed architecture performs better both in accuracy and efficiency. Our architecture outperforms the current best methods on the QaTa-COVID19 dataset, improving mIoU and Dice scores by 3.08\% and 2.35\%, respectively. Meanwhile, our architecture surpasses the computational speed of existing multimodal networks. Finally, the architecture has a quick convergence and generality. It can exceed the performance of the current best methods even if it is trained with only 50\% of the dataset.
期刊介绍:
BPEX is an inclusive, international, multidisciplinary journal devoted to publishing new research on any application of physics and/or engineering in medicine and/or biology. Characterized by a broad geographical coverage and a fast-track peer-review process, relevant topics include all aspects of biophysics, medical physics and biomedical engineering. Papers that are almost entirely clinical or biological in their focus are not suitable. The journal has an emphasis on publishing interdisciplinary work and bringing research fields together, encompassing experimental, theoretical and computational work.