首页 > 最新文献

Computerized Medical Imaging and Graphics最新文献

英文 中文
Unified model with random penalty entropy loss for robust nasogastric tube placement analysis in X-ray. 基于随机惩罚熵损失的x线鼻胃管鲁棒放置分析统一模型。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-24 DOI: 10.1016/j.compmedimag.2026.102715
GwiSeong Moon, Kyoung Min Moon, Inseo Park, Kanghee Lee, Doohee Lee, Woo Jin Kim, Yoon Kim, Ji Young Hong, Hyun-Soo Choi

Background and objective: An accurate nasogastric (NG) tube placement assessment is essential to prevent serious complications. However, manual chest X-ray verification is prone to human error and variability. We propose a unified deep learning model that jointly performs segmentation and classification to improve the generalization and reliability of automated NG tube placement assessment.

Methods: We developed a unified architecture based on nnUNet, which was optimized simultaneously for segmentation and classification. To enhance robustness and reduce overconfidence, we introduce Random Penalty Entropy Loss, which dynamically scales entropy penalties during training. The model was evaluated on internal datasets (5674 chest X-rays from three South Korean hospitals) and an external dataset from MIMIC-CXR.

Results: On the internal test set, the proposed model outperformed the Wang 2-Stage method (F1: 93.94% vs. 87.39%), particularly in ambiguous cases. Baseline models using Focal Loss or Label Smoothing performed well internally but showed substantial performance drops and miscalibration externally. In contrast, our model with Random Penalty Entropy Loss achieved the highest external classification accuracy (F1: 66.34%, AUROC: 84.82%) and superior calibration (MCE: 0.429, ECE: 0.274).

Conclusion: The proposed unified model surpasses existing two-stage approaches in classification and calibration. Incorporating Random Penalty Entropy Loss improves robustness and generalization across diverse clinical settings. These results highlight the model's potential to reduce diagnostic errors and enhance patient safety in NG tube placement assessment.

背景与目的:准确的鼻胃管放置评估对预防严重并发症至关重要。然而,手动胸部x线验证容易出现人为错误和可变性。我们提出了一个统一的深度学习模型,联合进行分割和分类,以提高自动NG管放置评估的泛化和可靠性。方法:建立基于nnUNet的统一架构,并对该架构进行分割和分类的同时优化。为了增强鲁棒性和减少过度自信,我们引入了随机惩罚熵损失,它在训练过程中动态缩放熵惩罚。该模型在内部数据集(来自三家韩国医院的5674张胸部x光片)和MIMIC-CXR的外部数据集上进行了评估。结果:在内部测试集上,该模型优于Wang 2-Stage方法(F1: 93.94% vs. 87.39%),特别是在歧义情况下。使用焦损或标签平滑的基线模型在内部表现良好,但在外部表现出明显的性能下降和校准错误。相比之下,我们的随机惩罚熵损失模型获得了最高的外部分类准确率(F1: 66.34%, AUROC: 84.82%)和更好的校准(MCE: 0.429, ECE: 0.274)。结论:提出的统一模型在分类和标定方面优于现有的两阶段方法。结合随机惩罚熵损失提高鲁棒性和泛化在不同的临床设置。这些结果突出了该模型在减少NG管放置评估中的诊断错误和提高患者安全性方面的潜力。
{"title":"Unified model with random penalty entropy loss for robust nasogastric tube placement analysis in X-ray.","authors":"GwiSeong Moon, Kyoung Min Moon, Inseo Park, Kanghee Lee, Doohee Lee, Woo Jin Kim, Yoon Kim, Ji Young Hong, Hyun-Soo Choi","doi":"10.1016/j.compmedimag.2026.102715","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102715","url":null,"abstract":"<p><strong>Background and objective: </strong>An accurate nasogastric (NG) tube placement assessment is essential to prevent serious complications. However, manual chest X-ray verification is prone to human error and variability. We propose a unified deep learning model that jointly performs segmentation and classification to improve the generalization and reliability of automated NG tube placement assessment.</p><p><strong>Methods: </strong>We developed a unified architecture based on nnUNet, which was optimized simultaneously for segmentation and classification. To enhance robustness and reduce overconfidence, we introduce Random Penalty Entropy Loss, which dynamically scales entropy penalties during training. The model was evaluated on internal datasets (5674 chest X-rays from three South Korean hospitals) and an external dataset from MIMIC-CXR.</p><p><strong>Results: </strong>On the internal test set, the proposed model outperformed the Wang 2-Stage method (F1: 93.94% vs. 87.39%), particularly in ambiguous cases. Baseline models using Focal Loss or Label Smoothing performed well internally but showed substantial performance drops and miscalibration externally. In contrast, our model with Random Penalty Entropy Loss achieved the highest external classification accuracy (F1: 66.34%, AUROC: 84.82%) and superior calibration (MCE: 0.429, ECE: 0.274).</p><p><strong>Conclusion: </strong>The proposed unified model surpasses existing two-stage approaches in classification and calibration. Incorporating Random Penalty Entropy Loss improves robustness and generalization across diverse clinical settings. These results highlight the model's potential to reduce diagnostic errors and enhance patient safety in NG tube placement assessment.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102715"},"PeriodicalIF":4.9,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fuzzy rough set loss for deep learning-based precise medical image segmentation 基于深度学习的模糊粗糙集损失医学图像精确分割
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-23 DOI: 10.1016/j.compmedimag.2026.102716
Mohsin Furkh Dar , Avatharam Ganivada
Accurate segmentation of medical images is crucial for diagnosis and treatment planning, yet it remains challenging due to ambiguous lesion boundaries, class imbalance, and complex anatomical structures. We propose a novel Fuzzy Rough Set-inspired (FRS) loss function that addresses these challenges by integrating pixels’ fuzzy similarity relations with a boundary uncertainty model in a convex combination method. To obtain the boundary uncertainty model, the fuzzy lower and upper approximations of a set of pixels and membership weights are utilized. The FRS loss function enhances boundary sensitivity and handles prediction uncertainty through its dual components: a fuzzy similarity term that captures gradual transitions at lesion boundaries, and boundary uncertainty model that deals with uncertainty and mitigates class imbalance. Extensive experiments across five diverse medical imaging datasets — breast ultrasound, gastrointestinal polyps, brain Magnetic Resonance Imaging (MRI), chest Computed Tomography (CT), and skin lesions — demonstrate the effectiveness of our approach. The FRS loss achieves superior segmentation performance with an average improvement of 2.1% in Dice score compared to the best baseline method, while demonstrating statistically significant improvements across all evaluated metrics (p < 0.001). The FRS loss shows its robustness to moderate class imbalance while maintaining computational efficiency (mean inference time 0.075–0.12 s per image, 4.5 MB memory). These results suggest that the FRS loss function provides a robust and interpretable framework for precise medical image segmentation, particularly in cases with ambiguous boundaries and moderate imbalance. Code: https://github.com/MohsinFurkh/Fuzzy-Rough-Set-Loss.
医学图像的准确分割对于诊断和治疗计划至关重要,但由于病变边界模糊、分类不平衡和复杂的解剖结构,仍然具有挑战性。我们提出了一种新的模糊粗糙集启发(FRS)损失函数,该函数通过凸组合方法将像素的模糊相似关系与边界不确定性模型集成来解决这些挑战。为了获得边界不确定性模型,利用了一组像素和隶属权的模糊上下近似。FRS损失函数增强了边界敏感性,并通过其双重组成部分处理预测不确定性:模糊相似项捕获病变边界的逐渐转变,边界不确定性模型处理不确定性并减轻类不平衡。在五种不同的医学成像数据集(乳腺超声、胃肠道息肉、脑磁共振成像(MRI)、胸部计算机断层扫描(CT)和皮肤病变)上进行的广泛实验证明了我们方法的有效性。与最佳基线方法相比,FRS损失获得了卓越的分割性能,Dice得分平均提高2.1%,同时在所有评估指标上显示出统计学上显著的改进(p < 0.001)。在保持计算效率(平均推理时间0.075-0.12 s /图像,4.5 MB内存)的情况下,FRS损失对适度的类不平衡具有鲁棒性。这些结果表明,FRS损失函数为精确的医学图像分割提供了一个鲁棒和可解释的框架,特别是在边界模糊和中度不平衡的情况下。代码:https://github.com/MohsinFurkh/Fuzzy-Rough-Set-Loss。
{"title":"Fuzzy rough set loss for deep learning-based precise medical image segmentation","authors":"Mohsin Furkh Dar ,&nbsp;Avatharam Ganivada","doi":"10.1016/j.compmedimag.2026.102716","DOIUrl":"10.1016/j.compmedimag.2026.102716","url":null,"abstract":"<div><div>Accurate segmentation of medical images is crucial for diagnosis and treatment planning, yet it remains challenging due to ambiguous lesion boundaries, class imbalance, and complex anatomical structures. We propose a novel Fuzzy Rough Set-inspired (FRS) loss function that addresses these challenges by integrating pixels’ fuzzy similarity relations with a boundary uncertainty model in a convex combination method. To obtain the boundary uncertainty model, the fuzzy lower and upper approximations of a set of pixels and membership weights are utilized. The FRS loss function enhances boundary sensitivity and handles prediction uncertainty through its dual components: a fuzzy similarity term that captures gradual transitions at lesion boundaries, and boundary uncertainty model that deals with uncertainty and mitigates class imbalance. Extensive experiments across five diverse medical imaging datasets — breast ultrasound, gastrointestinal polyps, brain Magnetic Resonance Imaging (MRI), chest Computed Tomography (CT), and skin lesions — demonstrate the effectiveness of our approach. The FRS loss achieves superior segmentation performance with an average improvement of 2.1% in Dice score compared to the best baseline method, while demonstrating statistically significant improvements across all evaluated metrics (p <span><math><mo>&lt;</mo></math></span> 0.001). The FRS loss shows its robustness to moderate class imbalance while maintaining computational efficiency (mean inference time 0.075–0.12 s per image, 4.5 MB memory). These results suggest that the FRS loss function provides a robust and interpretable framework for precise medical image segmentation, particularly in cases with ambiguous boundaries and moderate imbalance. Code: <span><span>https://github.com/MohsinFurkh/Fuzzy-Rough-Set-Loss</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102716"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentiable Neural Architecture Search for medical image segmentation: A systematic review and field audit 医学图像分割的可微分神经结构搜索:系统回顾和现场审计
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-20 DOI: 10.1016/j.compmedimag.2026.102713
Emil Benedykciuk, Marcin Denkowski, Grzegorz M. Wójcik
Medical image segmentation is critical for diagnosis, treatment planning, and disease monitoring, yet differs from generic semantic segmentation due to volumetric data, modality-specific artifacts, costly and uncertain expert annotations, and domain shift across scanners and institutions. Neural Architecture Search (NAS) can automate model design, but many NAS paradigms become impractical for 3D segmentation because evaluating large numbers of candidate architectures is computationally prohibitive. Differentiable NAS (DNAS) alleviates this barrier by optimizing relaxed architectural choices with gradients in a weight-sharing supernet, making search feasible under realistic compute and memory budgets. However, DNAS introduces distinct methodological risks (e.g., optimization instability and discretization gap) and raises challenges in reproducibility and clinical deployability. We conduct a PRISMA-inspired systematic review of DNAS for medical image segmentation (multi-database screening, 2018-2025), retaining 33 papers representing 31 unique methods for quantitative analysis. Across the included studies, external validation on independent-site data is rare (10%), full code release (including search procedures) is limited (26%), and only a minority substantively addresses search stability (23%). Despite clear clinical relevance, multi-objective search that explicitly optimizes latency or memory is also uncommon (23%). We position DNAS within the broader NAS landscape, introduce a segmentation-focused taxonomy, and propose a NAS Reporting Card tailored to medical segmentation to improve transparency, comparability, and reproducibility.
医学图像分割对于诊断、治疗计划和疾病监测至关重要,但由于体积数据、模式特定的工件、昂贵且不确定的专家注释以及扫描仪和机构之间的域转移,它与一般的语义分割不同。神经结构搜索(NAS)可以实现模型设计的自动化,但由于评估大量候选结构在计算上令人望而却步,许多NAS范式对于3D分割变得不切实际。可微分NAS (DNAS)通过在权重共享超级网络中使用梯度优化轻松的体系结构选择,使搜索在实际计算和内存预算下可行,从而减轻了这一障碍。然而,dna引入了独特的方法风险(例如,优化不稳定性和离散化差距),并在可重复性和临床可部署性方面提出了挑战。我们对用于医学图像分割的dna进行了prisma启发的系统综述(多数据库筛选,2018-2025),保留了33篇论文,代表31种独特的定量分析方法。在纳入的研究中,对独立站点数据的外部验证很少(约10%),完整的代码发布(包括搜索过程)有限(约26%),只有少数研究实质性地解决了搜索稳定性(约23%)。尽管有明确的临床相关性,明确优化潜伏期或记忆的多目标搜索也不常见(约23%)。我们将dna定位在更广泛的NAS领域,引入以细分为重点的分类法,并提出针对医疗细分量身定制的NAS报告卡,以提高透明度、可比性和可重复性。
{"title":"Differentiable Neural Architecture Search for medical image segmentation: A systematic review and field audit","authors":"Emil Benedykciuk,&nbsp;Marcin Denkowski,&nbsp;Grzegorz M. Wójcik","doi":"10.1016/j.compmedimag.2026.102713","DOIUrl":"10.1016/j.compmedimag.2026.102713","url":null,"abstract":"<div><div>Medical image segmentation is critical for diagnosis, treatment planning, and disease monitoring, yet differs from generic semantic segmentation due to volumetric data, modality-specific artifacts, costly and uncertain expert annotations, and domain shift across scanners and institutions. Neural Architecture Search (NAS) can automate model design, but many NAS paradigms become impractical for 3D segmentation because evaluating large numbers of candidate architectures is computationally prohibitive. Differentiable NAS (DNAS) alleviates this barrier by optimizing relaxed architectural choices with gradients in a weight-sharing supernet, making search feasible under realistic compute and memory budgets. However, DNAS introduces distinct methodological risks (e.g., optimization instability and discretization gap) and raises challenges in reproducibility and clinical deployability. We conduct a PRISMA-inspired systematic review of DNAS for medical image segmentation (multi-database screening, 2018-2025), retaining 33 papers representing 31 unique methods for quantitative analysis. Across the included studies, external validation on independent-site data is rare (<span><math><mo>∼</mo></math></span>10%), full code release (including search procedures) is limited (<span><math><mo>∼</mo></math></span>26%), and only a minority substantively addresses search stability (<span><math><mo>∼</mo></math></span>23%). Despite clear clinical relevance, multi-objective search that explicitly optimizes latency or memory is also uncommon (<span><math><mo>∼</mo></math></span>23%). We position DNAS within the broader NAS landscape, introduce a segmentation-focused taxonomy, and propose a NAS Reporting Card tailored to medical segmentation to improve transparency, comparability, and reproducibility.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102713"},"PeriodicalIF":4.9,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AutoPromptSeg: Automated Decoupling of Uncertainty Prompts with SAM for semi-supervised medical image segmentation AutoPromptSeg:用于半监督医学图像分割的不确定性提示与SAM的自动解耦
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-17 DOI: 10.1016/j.compmedimag.2026.102708
Junan Zhu, Zhizhe Tang, Ping Ma, Zheng Liang, Chuanjian Wang
The scarcity of high-quality annotated data limits the application of supervised learning in disease diagnosis. Semi-supervised learning (SSL) offers a promising solution to this challenge, utilizing both limited labeled data and large-scale unlabeled data to significantly boost segmentation accuracy. While existing SSL methods focus on model-centric regularization strategies, the emergence of promptable foundation models like Segment Anything (SAM) presents new opportunities for paradigmatic advancement. However, SAM requires datasets with additional prompt annotations to guide the segmentation process, yet most existing medical imaging datasets do not contain them. To address this limitation, we introduce a novel semi-supervised 3D medical image segmentation method called AutoPromptSeg, which generates effective prompts with the Decoupled Uncertainty Prompt Generator (DUPG) while maintaining superior segmentation performance in data-scarce scenarios. Concurrently, we employ the Channel Alignment and Fusion Architecture (CAFA) to align features obtained from different branches, thereby bolstering the representational capacity of unlabeled data. Our proposed approach achieves state-of-the-art performance on three benchmarks: the multi-modality abdominal multi-organ segmentation challenge 2022 dataset (Amos 2022), the left atrium dataset (LA), and the brain tumor segmentation challenge 2020 dataset (BraTS 2020). AutoPromptSeg achieves Dice Score of 68.78% on Amos 2022, 90.02% on LA, and 86.63% on BraTS 2020 under only 10% labeled data setting, demonstrating the excellent performance of our semi-supervised learning framework in limited annotated data.
缺乏高质量的带注释数据限制了监督学习在疾病诊断中的应用。半监督学习(SSL)为这一挑战提供了一个很有前途的解决方案,它利用有限的标记数据和大规模的未标记数据来显著提高分割的准确性。虽然现有的SSL方法侧重于以模型为中心的正则化策略,但像分段任意(SAM)这样的快速基础模型的出现为范式的进步提供了新的机会。然而,SAM需要带有额外提示注释的数据集来指导分割过程,然而大多数现有的医学成像数据集不包含这些注释。为了解决这一限制,我们引入了一种名为AutoPromptSeg的新型半监督3D医学图像分割方法,该方法使用解耦不确定性提示生成器(decoupling Uncertainty Prompt Generator, DUPG)生成有效提示,同时在数据稀缺的情况下保持优越的分割性能。同时,我们采用通道对齐和融合架构(CAFA)来对齐从不同分支获得的特征,从而增强未标记数据的表示能力。我们提出的方法在三个基准上实现了最先进的性能:多模态腹部多器官分割挑战2022数据集(Amos 2022)、左心房数据集(LA)和脑肿瘤分割挑战2020数据集(BraTS 2020)。AutoPromptSeg在仅10%的标注数据设置下,在Amos 2022上获得68.78%的Dice Score,在LA上获得90.02%的Dice Score,在BraTS 2020上获得86.63%的Dice Score,证明了我们的半监督学习框架在有限标注数据下的优异性能。
{"title":"AutoPromptSeg: Automated Decoupling of Uncertainty Prompts with SAM for semi-supervised medical image segmentation","authors":"Junan Zhu,&nbsp;Zhizhe Tang,&nbsp;Ping Ma,&nbsp;Zheng Liang,&nbsp;Chuanjian Wang","doi":"10.1016/j.compmedimag.2026.102708","DOIUrl":"10.1016/j.compmedimag.2026.102708","url":null,"abstract":"<div><div>The scarcity of high-quality annotated data limits the application of supervised learning in disease diagnosis. Semi-supervised learning (SSL) offers a promising solution to this challenge, utilizing both limited labeled data and large-scale unlabeled data to significantly boost segmentation accuracy. While existing SSL methods focus on model-centric regularization strategies, the emergence of promptable foundation models like Segment Anything (SAM) presents new opportunities for paradigmatic advancement. However, SAM requires datasets with additional prompt annotations to guide the segmentation process, yet most existing medical imaging datasets do not contain them. To address this limitation, we introduce a novel semi-supervised 3D medical image segmentation method called AutoPromptSeg, which generates effective prompts with the Decoupled Uncertainty Prompt Generator (DUPG) while maintaining superior segmentation performance in data-scarce scenarios. Concurrently, we employ the Channel Alignment and Fusion Architecture (CAFA) to align features obtained from different branches, thereby bolstering the representational capacity of unlabeled data. Our proposed approach achieves state-of-the-art performance on three benchmarks: the multi-modality abdominal multi-organ segmentation challenge 2022 dataset (Amos 2022), the left atrium dataset (LA), and the brain tumor segmentation challenge 2020 dataset (BraTS 2020). AutoPromptSeg achieves Dice Score of 68.78% on Amos 2022, 90.02% on LA, and 86.63% on BraTS 2020 under only 10% labeled data setting, demonstrating the excellent performance of our semi-supervised learning framework in limited annotated data.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102708"},"PeriodicalIF":4.9,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DFMFI: ultrasound breast cancer detection method based on dynamic fusion multi-scale feature interaction model DFMFI:基于动态融合多尺度特征交互模型的超声乳腺癌检测方法
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-16 DOI: 10.1016/j.compmedimag.2026.102710
Chenbin Ma , Haonan Zhang , Lishuang Guo
Ultrasound imaging has become an important method of breast cancer screening due to its non-invasive, low-cost, and ionizing radiation-free characteristics. However, the complexity and uncertainty of ultrasound images (such as speckle noise, morphological diversity of lesion areas, and inter-class similarity) pose challenges to traditional computer-aided diagnosis systems. In response to these issues, this paper proposes a Dynamic Fusion Multi-Scale Feature Interaction Model (DFMFI), specifically designed for the task of benign and malignant breast cancer detection in ultrasound imaging. DFMFI enhances the model's ability to model complex lesion features by combining dynamic feature fusion, multi-scale feature aggregation, and nonlinear dynamic interaction mechanisms. The model includes three core modules: the dynamic feature mixer uses overlapped spatial reduction attention and dynamic depth convolution to efficiently integrate global and local information; the efficient multi-scale feature aggregator captures multi-scale lesion features through a multi-branch structure; the dynamic gated feed forward network enhances the adaptability of feature flow through gating mechanisms and nonlinear reconstruction. Experimental results show that DFMFI significantly outperforms existing methods in terms of classification accuracy, robustness, and computational efficiency, providing an efficient and robust solution for the early screening and diagnosis of breast cancer.
超声成像具有无创、低成本、无电离辐射等特点,已成为乳腺癌筛查的重要手段。然而,超声图像的复杂性和不确定性(如斑点噪声、病变区域的形态多样性和类间相似性)给传统的计算机辅助诊断系统带来了挑战。针对这些问题,本文提出了一种动态融合多尺度特征交互模型(DFMFI),专门用于超声成像中乳腺癌良恶性检测任务。DFMFI结合动态特征融合、多尺度特征聚合和非线性动态交互机制,增强了模型对复杂病变特征的建模能力。该模型包括三个核心模块:动态特征混合器使用重叠的空间约简关注和动态深度卷积来有效地整合全局和局部信息;高效的多尺度特征聚合器通过多分支结构捕获多尺度病变特征;动态门控前馈网络通过门控机制和非线性重构增强了特征流的自适应性。实验结果表明,DFMFI在分类准确率、鲁棒性和计算效率方面明显优于现有方法,为乳腺癌的早期筛查和诊断提供了高效、鲁棒的解决方案。
{"title":"DFMFI: ultrasound breast cancer detection method based on dynamic fusion multi-scale feature interaction model","authors":"Chenbin Ma ,&nbsp;Haonan Zhang ,&nbsp;Lishuang Guo","doi":"10.1016/j.compmedimag.2026.102710","DOIUrl":"10.1016/j.compmedimag.2026.102710","url":null,"abstract":"<div><div>Ultrasound imaging has become an important method of breast cancer screening due to its non-invasive, low-cost, and ionizing radiation-free characteristics. However, the complexity and uncertainty of ultrasound images (such as speckle noise, morphological diversity of lesion areas, and inter-class similarity) pose challenges to traditional computer-aided diagnosis systems. In response to these issues, this paper proposes a Dynamic Fusion Multi-Scale Feature Interaction Model (DFMFI), specifically designed for the task of benign and malignant breast cancer detection in ultrasound imaging. DFMFI enhances the model's ability to model complex lesion features by combining dynamic feature fusion, multi-scale feature aggregation, and nonlinear dynamic interaction mechanisms. The model includes three core modules: the dynamic feature mixer uses overlapped spatial reduction attention and dynamic depth convolution to efficiently integrate global and local information; the efficient multi-scale feature aggregator captures multi-scale lesion features through a multi-branch structure; the dynamic gated feed forward network enhances the adaptability of feature flow through gating mechanisms and nonlinear reconstruction. Experimental results show that DFMFI significantly outperforms existing methods in terms of classification accuracy, robustness, and computational efficiency, providing an efficient and robust solution for the early screening and diagnosis of breast cancer.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102710"},"PeriodicalIF":4.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spectral attribute reasoning for interpretable multi-modal pathological segmentation 可解释的多模态病理分割的光谱属性推理
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-14 DOI: 10.1016/j.compmedimag.2026.102707
Lixin Zhang , Qian Wang , Zhao Chen , Ying Chen
Accurate segmentation of diverse histological entities is fundamental in computational pathology and critical for clinical diagnosis. Advances in microscopic imaging provide complementary information, particularly microscopic hyperspectral images (MHSIs) capture pathological differences through distinct spectral signatures, while RGB images offer high-resolution spatial and texture details. However, most multi-modal methods emphasize representation learning and modality alignment, but they offer limited insight into how the modalities interact to inform segmentation. This lack of explicit reasoning limits interpretability, and existing approaches, largely based on text prompts or spatial patterns, fail to exploit the pathology-relevant spectral signatures in MHSIs. To address these gaps, we propose Pisa-Net, a Pathology-Interpretable Spectral Attribute Learning Network for MHSI–RGB segmentation. Pisa-Net performs interpretable spectral reasoning through knowledge-driven attribute learning, incorporating pathology knowledge via pathologist-selected spectral signatures from key histological entities. These spectral attributes and the MHSI inputs are encoded through a frequency-domain representation into attribute embeddings and MHSI representations, whose similarities provide explicit pathology-grounded spectral evidence. The frequency components are further decomposed into low-, mid-, and high-frequency ranges and adaptively re-weighted via learned phase and magnitude, enabling the model to capture global semantics, structural patterns, and fine discriminative details. Guided by this spectral evidence, Pisa-Net integrates RGB and MHSI features through sparse spatial compression, ensuring that multi-modal fusion remains consistent with the underlying pathological reasoning. Experiments on public multi-modal pathology datasets demonstrate that Pisa-Net achieves superior segmentation performance in cells, glands, and tumors while improving interpretability by explicitly linking predictions to spectral evidence aligned with pathology knowledge.
不同组织实体的准确分割是计算病理学的基础,也是临床诊断的关键。显微成像技术的进步提供了互补的信息,特别是显微高光谱图像(MHSIs)通过不同的光谱特征捕捉病理差异,而RGB图像提供高分辨率的空间和纹理细节。然而,大多数多模态方法强调表征学习和模态对齐,但它们对模态如何相互作用以通知分割提供了有限的见解。这种明确推理的缺乏限制了可解释性,现有的方法主要基于文本提示或空间模式,无法利用mhsi中与病理相关的频谱特征。为了解决这些差距,我们提出了一种用于MHSI-RGB分割的病理可解释光谱属性学习网络Pisa-Net。Pisa-Net通过知识驱动的属性学习执行可解释的频谱推理,通过病理学家选择的关键组织学实体的频谱特征结合病理知识。这些频谱属性和MHSI输入通过频域表示编码为属性嵌入和MHSI表示,其相似性提供了明确的基于病理的频谱证据。频率分量被进一步分解为低、中、高频范围,并通过学习到的相位和幅度自适应地重新加权,使模型能够捕获全局语义、结构模式和精细的判别细节。在这一光谱证据的指导下,Pisa-Net通过稀疏空间压缩整合RGB和MHSI特征,确保多模态融合与潜在的病理推理保持一致。在公共多模态病理数据集上的实验表明,Pisa-Net在细胞、腺体和肿瘤中实现了卓越的分割性能,同时通过明确地将预测与与病理知识一致的光谱证据联系起来,提高了可解释性。
{"title":"Spectral attribute reasoning for interpretable multi-modal pathological segmentation","authors":"Lixin Zhang ,&nbsp;Qian Wang ,&nbsp;Zhao Chen ,&nbsp;Ying Chen","doi":"10.1016/j.compmedimag.2026.102707","DOIUrl":"10.1016/j.compmedimag.2026.102707","url":null,"abstract":"<div><div>Accurate segmentation of diverse histological entities is fundamental in computational pathology and critical for clinical diagnosis. Advances in microscopic imaging provide complementary information, particularly microscopic hyperspectral images (MHSIs) capture pathological differences through distinct spectral signatures, while RGB images offer high-resolution spatial and texture details. However, most multi-modal methods emphasize representation learning and modality alignment, but they offer limited insight into how the modalities interact to inform segmentation. This lack of explicit reasoning limits interpretability, and existing approaches, largely based on text prompts or spatial patterns, fail to exploit the pathology-relevant spectral signatures in MHSIs. To address these gaps, we propose Pisa-Net, a <strong>P</strong>athology-<strong>I</strong>nterpretable <strong>S</strong>pectral <strong>A</strong>ttribute Learning <strong>Net</strong>work for MHSI–RGB segmentation. Pisa-Net performs interpretable spectral reasoning through knowledge-driven attribute learning, incorporating pathology knowledge via pathologist-selected spectral signatures from key histological entities. These spectral attributes and the MHSI inputs are encoded through a frequency-domain representation into attribute embeddings and MHSI representations, whose similarities provide explicit pathology-grounded spectral evidence. The frequency components are further decomposed into low-, mid-, and high-frequency ranges and adaptively re-weighted via learned phase and magnitude, enabling the model to capture global semantics, structural patterns, and fine discriminative details. Guided by this spectral evidence, Pisa-Net integrates RGB and MHSI features through sparse spatial compression, ensuring that multi-modal fusion remains consistent with the underlying pathological reasoning. Experiments on public multi-modal pathology datasets demonstrate that Pisa-Net achieves superior segmentation performance in cells, glands, and tumors while improving interpretability by explicitly linking predictions to spectral evidence aligned with pathology knowledge.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102707"},"PeriodicalIF":4.9,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The 4D Human Embryonic Brain Atlas: Spatiotemporal atlas generation for rapid anatomical changes 4D人类胚胎脑图谱:快速解剖变化的时空图谱生成
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-13 DOI: 10.1016/j.compmedimag.2026.102702
Wietske A.P. Bastiaansen , Melek Rousian , Anton H.J. Koning , Wiro J. Niessen , Bernadette S. de Bakker , Régine P.M. Steegers-Theunissen , Stefan Klein
Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to an anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anatomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.
早期大脑发育对终身神经发育健康至关重要。然而,尽管大脑在几天的时间跨度内经历了快速的变化,但目前的临床实践提供的正常胚胎脑解剖超声知识有限。为了提供对正常大脑发育的详细了解并识别偏差,我们使用基于深度学习的方法创建了4D人类胚胎大脑图谱,用于分组注册和时空图谱生成。我们的方法引入了一个与时间相关的初始图谱,并对其偏差进行惩罚,确保在快速开发过程中保持年龄特异性解剖。该图谱是通过在妊娠8周至12周期间获得的来自402名鹿特丹围孕期队列受试者的831张3D超声图像生成并验证的。我们通过消融研究评估了该方法的有效性,该研究表明,结合时间依赖的初始图谱和惩罚产生了解剖学上准确的结果。相比之下,忽略这些适应导致了一个解剖学上不正确的地图集。与现有离体胚胎图谱的视觉比较进一步证实了我们图谱的解剖学准确性。总之,所提出的方法成功地捕获了胚胎大脑的快速解剖发育。由此产生的4D人类胚胎脑图谱为这一关键的早期生命阶段提供了独特的见解,并具有改善产前神经发育障碍的检测、预防和治疗的潜力。
{"title":"The 4D Human Embryonic Brain Atlas: Spatiotemporal atlas generation for rapid anatomical changes","authors":"Wietske A.P. Bastiaansen ,&nbsp;Melek Rousian ,&nbsp;Anton H.J. Koning ,&nbsp;Wiro J. Niessen ,&nbsp;Bernadette S. de Bakker ,&nbsp;Régine P.M. Steegers-Theunissen ,&nbsp;Stefan Klein","doi":"10.1016/j.compmedimag.2026.102702","DOIUrl":"10.1016/j.compmedimag.2026.102702","url":null,"abstract":"<div><div>Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to an anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anatomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102702"},"PeriodicalIF":4.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TGIAlign: Text-guided dual-branch bidirectional framework for cross-modal semantic alignment in medical vision-language TGIAlign:用于医学视觉语言中跨模态语义对齐的文本引导双分支双向框架
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-13 DOI: 10.1016/j.compmedimag.2025.102694
Wenhua Li, Lifang Wang, Min Zhao, Xingzhang Lü, Linwen Yi
Medical image–text alignment remains challenging due to subtle lesion patterns, heterogeneous vision–language semantics, and the lack of lesion-aware guidance during visual encoding. Existing methods typically introduce textual information only after visual features have been computed, leaving early and mid-level representations insufficiently conditioned on diagnostic semantics. This limits the model’s ability to capture fine-grained abnormalities and maintain stable alignment across heterogeneous chest X-ray datasets. To address these limitations, we propose TGIAlign, a text-guided dual-branch bidirectional alignment framework that applies structured, lesion-centric cues to intermediate visual representations obtained from the frozen encoder. A large language model (LLM) is used to extract normalized, attribute-based lesion descriptions, providing consistent semantic guidance across samples. These cues are incorporated through the Text-Guided Image Feature Weighting (TGIF) module, which reweights intermediate feature outputs using similarity-derived weights, enabling multi-scale semantic conditioning without modifying the frozen backbone. To capture complementary visual cues, TGIAlign integrates multi-scale text-guided features with high-level visual representations through a Dual-Branch Bidirectional Alignment (DBBA) mechanism. Experiments on six public chest X-ray datasets demonstrate that TGIAlign achieves stable top-K retrieval and reliable text-guided lesion localization, highlighting the effectiveness of early semantic conditioning combined with dual-branch alignment for improving medical vision–language correspondence within chest X-ray settings.
由于细微的病变模式、异构的视觉语言语义以及在视觉编码过程中缺乏病变感知引导,医学图像-文本对齐仍然具有挑战性。现有的方法通常只在计算了视觉特征之后才引入文本信息,使得早期和中期的表示没有充分地依赖于诊断语义。这限制了该模型捕获细粒度异常并在异构胸部x射线数据集上保持稳定对齐的能力。为了解决这些限制,我们提出了TGIAlign,这是一个文本引导的双分支双向对齐框架,它将结构化的、以病变为中心的线索应用于从冻结编码器获得的中间视觉表示。使用大型语言模型(LLM)提取归一化的、基于属性的病变描述,在样本之间提供一致的语义指导。这些线索通过文本引导图像特征加权(TGIF)模块合并,该模块使用相似度衍生的权重重新加权中间特征输出,从而在不修改冻结主干的情况下实现多尺度语义调节。为了捕获互补的视觉线索,TGIAlign通过双分支双向对齐(Dual-Branch Bidirectional Alignment, DBBA)机制将多尺度文本引导特征与高级视觉表示相结合。在6个公共胸片数据集上的实验表明,TGIAlign实现了稳定的top-K检索和可靠的文本引导病变定位,突出了早期语义条件反射结合双分支对齐在改善胸片环境下医学视觉语言对应性方面的有效性。
{"title":"TGIAlign: Text-guided dual-branch bidirectional framework for cross-modal semantic alignment in medical vision-language","authors":"Wenhua Li,&nbsp;Lifang Wang,&nbsp;Min Zhao,&nbsp;Xingzhang Lü,&nbsp;Linwen Yi","doi":"10.1016/j.compmedimag.2025.102694","DOIUrl":"10.1016/j.compmedimag.2025.102694","url":null,"abstract":"<div><div>Medical image–text alignment remains challenging due to subtle lesion patterns, heterogeneous vision–language semantics, and the lack of lesion-aware guidance during visual encoding. Existing methods typically introduce textual information only after visual features have been computed, leaving early and mid-level representations insufficiently conditioned on diagnostic semantics. This limits the model’s ability to capture fine-grained abnormalities and maintain stable alignment across heterogeneous chest X-ray datasets. To address these limitations, we propose TGIAlign, a text-guided dual-branch bidirectional alignment framework that applies structured, lesion-centric cues to intermediate visual representations obtained from the frozen encoder. A large language model (LLM) is used to extract normalized, attribute-based lesion descriptions, providing consistent semantic guidance across samples. These cues are incorporated through the Text-Guided Image Feature Weighting (TGIF) module, which reweights intermediate feature outputs using similarity-derived weights, enabling multi-scale semantic conditioning without modifying the frozen backbone. To capture complementary visual cues, TGIAlign integrates multi-scale text-guided features with high-level visual representations through a Dual-Branch Bidirectional Alignment (DBBA) mechanism. Experiments on six public chest X-ray datasets demonstrate that TGIAlign achieves stable top-K retrieval and reliable text-guided lesion localization, highlighting the effectiveness of early semantic conditioning combined with dual-branch alignment for improving medical vision–language correspondence within chest X-ray settings.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102694"},"PeriodicalIF":4.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TA-MedSAM: Text-augmented improved MedSAM for pulmonary lesion segmentation TA-MedSAM:文本增强改进MedSAM用于肺病变分割
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-12 DOI: 10.1016/j.compmedimag.2026.102698
Siyuan Tang , Siriguleng Wang , Gang Xiang , Jinliang Zhao , Yuxin Wang
Accurate segmentation of lung lesions is critical for clinical diagnosis. Traditional methods rely solely on unimodal visual data, which limits the performance of existing medical image segmentation models. This paper introduces a novel approach, Text-Augmented Medical Segment Anything Module(TA-MedSAM), which enhances cross-modal representation capabilities through a vision-language fusion paradigm. This method significantly improves segmentation accuracy for pulmonary lesions with challenging characteristics including low contrast, blurred boundaries, complex morphology, and small size. Firstly, we introduce a lightweight Medical Segment Anything Model (MedSAM) image encoder and a pre-trained ClinicalBERT text encoder to extract visual and textual features, This design preserves segmentation performance while reducing model parameters and computational costs, thereby enhancing inference speed. Secondly, a Reconstruction Text Module is proposed to focus the model on lesion-centric textual cues, strengthening semantic guidance for segmentation. Thirdly, we develop an effective Multimodal Feature Fusion Module that integrates visual and textual features using attention mechanisms, and introduce a feature alignment coordination mechanism to mutually enhance heterogeneous information across modalities, and a Dynamic Perception Learning Mechanism is proposed to quantitatively evaluate fusion effectiveness, enabling optimal fused feature selection for improved segmentation accuracy. Finally, a Multi-scale Feature Fusion Module combined with a Multi-task Loss Function enhances segmentation performance for complex regions. Comparative experiments demonstrate that TA-MedSAM outperforms state-of-the-art unimodal and multimodal methods on QaTa-COV19, MosMedData+ , and private dataset. Extensive ablation studies validate the efficacy of our proposed components and optimal hyperparameter combinations.
肺病变的准确分割对临床诊断至关重要。传统的医学图像分割方法仅依赖于单峰视觉数据,这限制了现有医学图像分割模型的性能。本文介绍了一种新的方法,文本增强医学片段任意模块(TA-MedSAM),它通过视觉语言融合范式增强了跨模态表示能力。该方法显著提高了对对比度低、边界模糊、形态复杂、体积小等具有挑战性特征的肺病变的分割精度。首先,我们引入了一个轻量级的医学分割任意模型(MedSAM)图像编码器和一个预训练的ClinicalBERT文本编码器来提取视觉和文本特征,该设计在保留分割性能的同时减少了模型参数和计算成本,从而提高了推理速度。其次,提出重构文本模块,使模型聚焦于以病灶为中心的文本线索,加强对分割的语义引导;第三,我们开发了一种有效的多模态特征融合模块,利用注意机制将视觉和文本特征集成在一起,引入特征对齐协调机制来相互增强模态间的异构信息,并提出了一种动态感知学习机制来定量评估融合效果,实现最优融合特征选择以提高分割精度。最后,结合多任务损失函数的多尺度特征融合模块提高了复杂区域的分割性能。对比实验表明,TA-MedSAM在QaTa-COV19、MosMedData+ 和私有数据集上的性能优于最先进的单峰和多峰方法。广泛的消融研究证实了我们提出的组件和最佳超参数组合的有效性。
{"title":"TA-MedSAM: Text-augmented improved MedSAM for pulmonary lesion segmentation","authors":"Siyuan Tang ,&nbsp;Siriguleng Wang ,&nbsp;Gang Xiang ,&nbsp;Jinliang Zhao ,&nbsp;Yuxin Wang","doi":"10.1016/j.compmedimag.2026.102698","DOIUrl":"10.1016/j.compmedimag.2026.102698","url":null,"abstract":"<div><div>Accurate segmentation of lung lesions is critical for clinical diagnosis. Traditional methods rely solely on unimodal visual data, which limits the performance of existing medical image segmentation models. This paper introduces a novel approach, Text-Augmented Medical Segment Anything Module(TA-MedSAM), which enhances cross-modal representation capabilities through a vision-language fusion paradigm. This method significantly improves segmentation accuracy for pulmonary lesions with challenging characteristics including low contrast, blurred boundaries, complex morphology, and small size. Firstly, we introduce a lightweight Medical Segment Anything Model (MedSAM) image encoder and a pre-trained ClinicalBERT text encoder to extract visual and textual features, This design preserves segmentation performance while reducing model parameters and computational costs, thereby enhancing inference speed. Secondly, a Reconstruction Text Module is proposed to focus the model on lesion-centric textual cues, strengthening semantic guidance for segmentation. Thirdly, we develop an effective Multimodal Feature Fusion Module that integrates visual and textual features using attention mechanisms, and introduce a feature alignment coordination mechanism to mutually enhance heterogeneous information across modalities, and a Dynamic Perception Learning Mechanism is proposed to quantitatively evaluate fusion effectiveness, enabling optimal fused feature selection for improved segmentation accuracy. Finally, a Multi-scale Feature Fusion Module combined with a Multi-task Loss Function enhances segmentation performance for complex regions. Comparative experiments demonstrate that TA-MedSAM outperforms state-of-the-art unimodal and multimodal methods on QaTa-COV19, MosMedData+ , and private dataset. Extensive ablation studies validate the efficacy of our proposed components and optimal hyperparameter combinations.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102698"},"PeriodicalIF":4.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ThyFusionNet: A CNN–transformer framework with spatial aware sparse attention for multi modal thyroid disease diagnosis ThyFusionNet:一个具有空间感知稀疏关注的CNN-transformer框架,用于多模态甲状腺疾病诊断
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-11 DOI: 10.1016/j.compmedimag.2026.102706
Bing Yang , Jun Li , Junyang Chen , Yutong Huang , Nanbo Xu , Qiurui Liu , Jiaxin Liu , Yuheng Zhou
In medical image analysis, accurately diagnosing complex lesions remains a formidable challenge, especially for thyroid disorders, which exhibit high incidence and intricate pathology. To enhance diagnostic precision and robustness, we assembled ThyM3, a large-scale multimodal dataset comprising thyroid computed tomography and ultrasound images. Building on this resource, we introduce ThyFusionNet, a novel deep-learning architecture that combines convolutional backbones with transformer modules and performs feature-level fusion to exploit complementary cues across modalities. To improve semantic alignment and spatial modeling, we incorporate head-wise positional encodings and an adaptive sparse attention scheme that suppresses redundant activations while highlighting key features. Skip connections are used to retain low-level details, and a gated-attention fusion block further enriches cross-modal interaction. We also propose an adaptive contrastive-entropy loss that preserves feature consistency and simultaneously enhances prediction discriminability and stability. Extensive experiments demonstrate that ThyFusionNet surpasses current leading methods in accuracy, robustness, and generalization, underscoring its strong potential for clinical deployment.
在医学图像分析中,准确诊断复杂病变仍然是一个艰巨的挑战,特别是甲状腺疾病,其发病率高,病理复杂。为了提高诊断的准确性和鲁棒性,我们组装了ThyM3,一个由甲状腺计算机断层扫描和超声图像组成的大规模多模态数据集。在此基础上,我们介绍了ThyFusionNet,这是一种新颖的深度学习架构,将卷积主干与变压器模块相结合,并执行特征级融合,以利用跨模式的互补线索。为了改善语义对齐和空间建模,我们结合了头部位置编码和自适应稀疏注意方案,该方案在突出关键特征的同时抑制冗余激活。跳过连接用于保留低级细节,而门控注意融合块进一步丰富了跨模态交互。我们还提出了一种自适应的对比熵损失算法,在保持特征一致性的同时增强了预测的可判别性和稳定性。大量实验表明,ThyFusionNet在准确性、稳健性和泛化方面超过了目前领先的方法,强调了其在临床部署方面的强大潜力。
{"title":"ThyFusionNet: A CNN–transformer framework with spatial aware sparse attention for multi modal thyroid disease diagnosis","authors":"Bing Yang ,&nbsp;Jun Li ,&nbsp;Junyang Chen ,&nbsp;Yutong Huang ,&nbsp;Nanbo Xu ,&nbsp;Qiurui Liu ,&nbsp;Jiaxin Liu ,&nbsp;Yuheng Zhou","doi":"10.1016/j.compmedimag.2026.102706","DOIUrl":"10.1016/j.compmedimag.2026.102706","url":null,"abstract":"<div><div>In medical image analysis, accurately diagnosing complex lesions remains a formidable challenge, especially for thyroid disorders, which exhibit high incidence and intricate pathology. To enhance diagnostic precision and robustness, we assembled ThyM3, a large-scale multimodal dataset comprising thyroid computed tomography and ultrasound images. Building on this resource, we introduce ThyFusionNet, a novel deep-learning architecture that combines convolutional backbones with transformer modules and performs feature-level fusion to exploit complementary cues across modalities. To improve semantic alignment and spatial modeling, we incorporate head-wise positional encodings and an adaptive sparse attention scheme that suppresses redundant activations while highlighting key features. Skip connections are used to retain low-level details, and a gated-attention fusion block further enriches cross-modal interaction. We also propose an adaptive contrastive-entropy loss that preserves feature consistency and simultaneously enhances prediction discriminability and stability. Extensive experiments demonstrate that ThyFusionNet surpasses current leading methods in accuracy, robustness, and generalization, underscoring its strong potential for clinical deployment.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102706"},"PeriodicalIF":4.9,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computerized Medical Imaging and Graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1