Pub Date : 2025-10-01Epub Date: 2025-09-15DOI: 10.1016/j.compmedimag.2025.102644
Jun Wang , Lixing Zhu , Abhir Bhalerao , Yulan He
Radiology report generation (RRG) methods often lack sufficient medical knowledge to produce clinically accurate reports. A scene graph provides comprehensive information for describing objects within an image. However, automatically generated radiology scene graphs (RSG) may contain noise annotations and highly overlapping regions, posing challenges in utilizing RSG to enhance RRG. To this end, we propose Scene Graph aided RRG (SGRRG), a framework that leverages an automatically generated RSG and copes with noisy supervision problems in the RSG with a transformer-based module, effectively distilling medical knowledge in an end-to-end manner. SGRRG is composed of a dedicated scene graph encoder responsible for translating the radiography into a RSG, and a scene graph-aided decoder that takes advantage of both patch-level and region-level visual information and mitigates the noisy annotation problem in the RSG. The incorporation of both patch-level and region-level features, alongside the integration of the essential RSG construction modules, enhances our framework’s flexibility and robustness, enabling it to readily exploit prior advanced RRG techniques. A fine-grained, sentence-level attention method is designed to better distill the RSG information. Additionally, we introduce two proxy tasks to enhance the model’s ability to produce clinically accurate reports. Extensive experiments demonstrate that SGRRG outperforms previous state-of-the-art methods in report generation and can better capture abnormal findings. Code is available at https://github.com/Markin-Wang/SGRRG.
{"title":"SGRRG: Leveraging radiology scene graphs for improved and abnormality-aware radiology report generation","authors":"Jun Wang , Lixing Zhu , Abhir Bhalerao , Yulan He","doi":"10.1016/j.compmedimag.2025.102644","DOIUrl":"10.1016/j.compmedimag.2025.102644","url":null,"abstract":"<div><div>Radiology report generation (RRG) methods often lack sufficient medical knowledge to produce clinically accurate reports. A scene graph provides comprehensive information for describing objects within an image. However, automatically generated radiology scene graphs (RSG) may contain noise annotations and highly overlapping regions, posing challenges in utilizing RSG to enhance RRG. To this end, we propose Scene Graph aided RRG (SGRRG), a framework that leverages an automatically generated RSG and copes with noisy supervision problems in the RSG with a transformer-based module, effectively distilling medical knowledge in an end-to-end manner. SGRRG is composed of a dedicated scene graph encoder responsible for translating the radiography into a RSG, and a scene graph-aided decoder that takes advantage of both patch-level and region-level visual information and mitigates the noisy annotation problem in the RSG. The incorporation of both patch-level and region-level features, alongside the integration of the essential RSG construction modules, enhances our framework’s flexibility and robustness, enabling it to readily exploit prior advanced RRG techniques. A fine-grained, sentence-level attention method is designed to better distill the RSG information. Additionally, we introduce two proxy tasks to enhance the model’s ability to produce clinically accurate reports. Extensive experiments demonstrate that SGRRG outperforms previous state-of-the-art methods in report generation and can better capture abnormal findings. Code is available at <span><span>https://github.com/Markin-Wang/SGRRG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102644"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-26DOI: 10.1016/j.compmedimag.2025.102650
Yinuo Wang , Cai Meng , Zhe Xu
Accurate segmentation of vascular structures in volumetric medical images is critical for disease diagnosis and surgical planning. While deep neural networks have shown remarkable effectiveness, existing methods often rely on separate models tailored to specific modalities and anatomical regions, resulting in redundant parameters and limited generalization. Recent universal models address broader segmentation tasks but struggle with the unique challenges of vascular structures. To overcome these limitations, we first present VasBench, a new comprehensive vascular segmentation benchmark comprising nine sub-datasets spanning diverse modalities and anatomical regions. Building on this foundation, we introduce VasCab, a novel prompt-guided universal model for volumetric vascular segmentation, designed to “collect vascular specimens in one cabinet”. Specifically, VasCab is equipped with learnable domain and topology prompts to capture shared and unique vascular characteristics across diverse data domains, complemented by morphology perceptual loss to address complex morphological variations. Experimental results demonstrate that VasCab surpasses individual models and state-of-the-art medical foundation models across all test datasets, showcasing exceptional cross-domain integration and precise modeling of vascular morphological variations. Moreover, VasCab exhibits robust performance in downstream tasks, underscoring its versatility and potential for unified vascular analysis. This study marks a significant step toward universal vascular segmentation, offering a promising solution for unified vascular analysis across heterogeneous datasets. Code and dataset are available at https://github.com/mileswyn/VasCab.
{"title":"Collect vascular specimens in one cabinet: A hierarchical prompt-guided universal model for 3D vascular segmentation","authors":"Yinuo Wang , Cai Meng , Zhe Xu","doi":"10.1016/j.compmedimag.2025.102650","DOIUrl":"10.1016/j.compmedimag.2025.102650","url":null,"abstract":"<div><div>Accurate segmentation of vascular structures in volumetric medical images is critical for disease diagnosis and surgical planning. While deep neural networks have shown remarkable effectiveness, existing methods often rely on separate models tailored to specific modalities and anatomical regions, resulting in redundant parameters and limited generalization. Recent universal models address broader segmentation tasks but struggle with the unique challenges of vascular structures. To overcome these limitations, we first present <strong>VasBench</strong>, a new comprehensive vascular segmentation benchmark comprising nine sub-datasets spanning diverse modalities and anatomical regions. Building on this foundation, we introduce <strong>VasCab</strong>, a novel prompt-guided universal model for volumetric vascular segmentation, designed to “collect vascular specimens in one cabinet”. Specifically, VasCab is equipped with learnable domain and topology prompts to capture shared and unique vascular characteristics across diverse data domains, complemented by morphology perceptual loss to address complex morphological variations. Experimental results demonstrate that VasCab surpasses individual models and state-of-the-art medical foundation models across all test datasets, showcasing exceptional cross-domain integration and precise modeling of vascular morphological variations. Moreover, VasCab exhibits robust performance in downstream tasks, underscoring its versatility and potential for unified vascular analysis. This study marks a significant step toward universal vascular segmentation, offering a promising solution for unified vascular analysis across heterogeneous datasets. Code and dataset are available at <span><span>https://github.com/mileswyn/VasCab</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102650"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145201977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-30DOI: 10.1016/j.compmedimag.2025.102651
Jonghun Kim , Inye Na , Jiwon Chung , Ha-Na Song , Kyungseo Kim , Seongvin Ju , Mi-Yeon Eun , Woo-Keun Seo , Hyunjin Park
Intracranial vessel segmentation is essential for managing brain disorders, facilitating early detection and precise intervention of stroke and aneurysm. Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) is a commonly used vascular imaging technique for segmenting brain vessels. Traditional rule-based MRA segmentation methods were efficient, but suffered from instability and poor performance. Deep learning models, including diffusion models, have recently gained attention in medical image segmentation. However, they require ground truth for training, which is labor-intensive and time-consuming to obtain. We propose a novel segmentation method that combines the strengths of rule-based and diffusion models to improve segmentation without relying on explicit labels. Our model adopts a Frangi filter to help with vessel detection and modifies the diffusion models to exclude memory-intensive attention modules to improve efficiency. Our condition network concatenates the feature maps to further enhance the segmentation process. Quantitative and qualitative evaluations on two datasets demonstrate that our approach not only maintains the integrity of the vascular regions but also substantially reduces noise, offering a robust solution for segmenting intracranial vessels. Our results suggest a basis for improved patient care in disorders involving brain vessels. Our code is available at github.com/jongdory/Vessel-Diffusion.
{"title":"Enhancing intracranial vessel segmentation using diffusion models without manual annotation for 3D Time-of-Flight Magnetic Resonance Angiography","authors":"Jonghun Kim , Inye Na , Jiwon Chung , Ha-Na Song , Kyungseo Kim , Seongvin Ju , Mi-Yeon Eun , Woo-Keun Seo , Hyunjin Park","doi":"10.1016/j.compmedimag.2025.102651","DOIUrl":"10.1016/j.compmedimag.2025.102651","url":null,"abstract":"<div><div>Intracranial vessel segmentation is essential for managing brain disorders, facilitating early detection and precise intervention of stroke and aneurysm. Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) is a commonly used vascular imaging technique for segmenting brain vessels. Traditional rule-based MRA segmentation methods were efficient, but suffered from instability and poor performance. Deep learning models, including diffusion models, have recently gained attention in medical image segmentation. However, they require ground truth for training, which is labor-intensive and time-consuming to obtain. We propose a novel segmentation method that combines the strengths of rule-based and diffusion models to improve segmentation without relying on explicit labels. Our model adopts a Frangi filter to help with vessel detection and modifies the diffusion models to exclude memory-intensive attention modules to improve efficiency. Our condition network concatenates the feature maps to further enhance the segmentation process. Quantitative and qualitative evaluations on two datasets demonstrate that our approach not only maintains the integrity of the vascular regions but also substantially reduces noise, offering a robust solution for segmenting intracranial vessels. Our results suggest a basis for improved patient care in disorders involving brain vessels. Our code is available at <span><span>github.com/jongdory/Vessel-Diffusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102651"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145259815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-08-30DOI: 10.1016/j.compmedimag.2025.102635
Lei Xie , Huajun Zhou , Junxiong Huang , Qingrun Zeng , Jiahao Huang , Jianzhong He , Jiawei Zhang , Baohua Fan , Mingchu Li , Guoqiang Xie , Hao Chen , Yuanjing Feng
The segmentation of cranial nerves (CNs) tract provides a valuable quantitative tool for the analysis of the morphology and trajectory of individual CNs. Multimodal CN segmentation networks, e.g., CNTSeg, which combine structural Magnetic Resonance Imaging (MRI) and diffusion MRI, have achieved promising segmentation performance. However, it is laborious or even infeasible to collect complete multimodal data in clinical practice due to limitations in equipment, user privacy, and working conditions. In this work, we propose a novel arbitrary-modal fusion network for volumetric CN segmentation, called CNTSeg-v2, which trains one model to handle different combinations of available modalities. Instead of directly combining all the modalities, we select T1-weighted (T1w) images as the primary modality due to its simplicity in data acquisition and contribution most to the results, which supervises the information selection of other auxiliary modalities. Our model encompasses an Arbitrary-Modal Collaboration Module (ACM) designed to effectively extract informative features from other auxiliary modalities, guided by the supervision of T1w images. Meanwhile, we construct a Deep Distance-guided Multi-stage (DDM) decoder to correct small errors and discontinuities through signed distance maps to improve segmentation accuracy. We evaluate our CNTSeg-v2 on the Human Connectome Project (HCP) dataset and the clinical Multi-shell Diffusion MRI (MDM) dataset. Extensive experimental results show that our CNTSeg-v2 achieves state-of-the-art segmentation performance, outperforming all competing methods.
{"title":"An arbitrary-modal fusion network for volumetric cranial nerves tract segmentation","authors":"Lei Xie , Huajun Zhou , Junxiong Huang , Qingrun Zeng , Jiahao Huang , Jianzhong He , Jiawei Zhang , Baohua Fan , Mingchu Li , Guoqiang Xie , Hao Chen , Yuanjing Feng","doi":"10.1016/j.compmedimag.2025.102635","DOIUrl":"10.1016/j.compmedimag.2025.102635","url":null,"abstract":"<div><div>The segmentation of cranial nerves (CNs) tract provides a valuable quantitative tool for the analysis of the morphology and trajectory of individual CNs. Multimodal CN segmentation networks, e.g., CNTSeg, which combine structural Magnetic Resonance Imaging (MRI) and diffusion MRI, have achieved promising segmentation performance. However, it is laborious or even infeasible to collect complete multimodal data in clinical practice due to limitations in equipment, user privacy, and working conditions. In this work, we propose a novel arbitrary-modal fusion network for volumetric CN segmentation, called CNTSeg-v2, which trains one model to handle different combinations of available modalities. Instead of directly combining all the modalities, we select T1-weighted (T1w) images as the primary modality due to its simplicity in data acquisition and contribution most to the results, which supervises the information selection of other auxiliary modalities. Our model encompasses an Arbitrary-Modal Collaboration Module (ACM) designed to effectively extract informative features from other auxiliary modalities, guided by the supervision of T1w images. Meanwhile, we construct a Deep Distance-guided Multi-stage (DDM) decoder to correct small errors and discontinuities through signed distance maps to improve segmentation accuracy. We evaluate our CNTSeg-v2 on the Human Connectome Project (HCP) dataset and the clinical Multi-shell Diffusion MRI (MDM) dataset. Extensive experimental results show that our CNTSeg-v2 achieves state-of-the-art segmentation performance, outperforming all competing methods.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102635"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144989357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-13DOI: 10.1016/j.compmedimag.2025.102643
Samah Khawaled , Onur Afacan , Simon K. Warfield , Moti Freiman
Functional Magnetic Resonance Imaging (fMRI) is vital in neuroscience, enabling investigations into brain disorders, treatment monitoring, and brain function mapping. However, head motion during fMRI scans, occurring between shots of slice acquisition, can result in distortion, biased analyses, and increased costs due to the need for scan repetitions. Therefore, retrospective slice-level motion correction through slice-to-volume registration (SVR) is crucial. Previous studies have utilized deep learning (DL) based models to address the SVR task; however, they overlooked the uncertainty stemming from the input stack of slices and did not assign weighting or scoring to each slice. Treating all slices equally ignores the variability in their relevance, leading to suboptimal predictions. In this work, we introduce an end-to-end SVR model for aligning 2D fMRI slices with a 3D reference volume, incorporating a self-attention mechanism to enhance robustness against input data variations and uncertainties. Our SVR model utilizes independent slice and volume encoders and a self-attention module to assign pixel-wise scores for each slice. We used the publicly available Healthy Brain Network (HBN) dataset. We split the volumes into training (64%), validation (16%), and test (20%) sets. To conduct the simulated motion study, we synthesized rigid transformations across a wide range of parameters and applied them to the reference volumes. Slices were then sampled according to the acquisition protocol to generate 2,000, 500, and 200 3D volume–2D slice pairs for the training, validation, and test sets, respectively. Our experimental results demonstrate that our model achieves competitive performance in terms of alignment accuracy compared to state-of-the-art deep learning-based methods (Euclidean distance of 0.93 [mm] vs. 1.86 [mm], a paired t-test with a -value of ). Furthermore, our approach exhibits faster registration speed compared to conventional iterative methods (0.096 s vs. 1.17 s). Our end-to-end SVR model facilitates real-time head motion tracking during fMRI acquisition, ensuring reliability and robustness against uncertainties in the inputs.
功能磁共振成像(fMRI)在神经科学中是至关重要的,它使研究大脑疾病、治疗监测和脑功能绘图成为可能。然而,在fMRI扫描期间,头部运动发生在切片采集之间,由于需要重复扫描,可能导致失真、分析偏差和成本增加。因此,通过切片-体积配准(SVR)进行回顾性切片级运动校正至关重要。以前的研究利用基于深度学习(DL)的模型来解决SVR任务;然而,他们忽略了来自切片输入堆栈的不确定性,并且没有为每个切片分配权重或评分。平等地对待所有切片忽略了它们相关性的可变性,导致次优预测。在这项工作中,我们引入了一个端到端的SVR模型,用于将2D fMRI切片与3D参考体积对准,该模型结合了一个自注意机制,以增强对输入数据变化和不确定性的鲁棒性。我们的SVR模型利用独立的切片和音量编码器以及自关注模块为每个切片分配像素级分数。我们使用了公开可用的健康大脑网络(HBN)数据集。我们将这些数据集分成训练集(64%)、验证集(16%)和测试集(20%)。为了进行模拟运动研究,我们在广泛的参数范围内合成了刚性变换,并将它们应用于参考体积。然后根据采集协议对切片进行采样,分别为训练集、验证集和测试集生成2,000、500和200个3D体- 2d切片对。我们的实验结果表明,与最先进的基于深度学习的方法相比,我们的模型在对齐精度方面取得了具有竞争力的性能(欧几里得距离为0.93 [mm] vs. 1.86 [mm],配对t检验,p值为p
{"title":"A self-attention model for robust rigid slice-to-volume registration of functional MRI","authors":"Samah Khawaled , Onur Afacan , Simon K. Warfield , Moti Freiman","doi":"10.1016/j.compmedimag.2025.102643","DOIUrl":"10.1016/j.compmedimag.2025.102643","url":null,"abstract":"<div><div>Functional Magnetic Resonance Imaging (fMRI) is vital in neuroscience, enabling investigations into brain disorders, treatment monitoring, and brain function mapping. However, head motion during fMRI scans, occurring between shots of slice acquisition, can result in distortion, biased analyses, and increased costs due to the need for scan repetitions. Therefore, retrospective slice-level motion correction through slice-to-volume registration (SVR) is crucial. Previous studies have utilized deep learning (DL) based models to address the SVR task; however, they overlooked the uncertainty stemming from the input stack of slices and did not assign weighting or scoring to each slice. Treating all slices equally ignores the variability in their relevance, leading to suboptimal predictions. In this work, we introduce an end-to-end SVR model for aligning 2D fMRI slices with a 3D reference volume, incorporating a self-attention mechanism to enhance robustness against input data variations and uncertainties. Our SVR model utilizes independent slice and volume encoders and a self-attention module to assign pixel-wise scores for each slice. We used the publicly available Healthy Brain Network (HBN) dataset. We split the volumes into training (64%), validation (16%), and test (20%) sets. To conduct the simulated motion study, we synthesized rigid transformations across a wide range of parameters and applied them to the reference volumes. Slices were then sampled according to the acquisition protocol to generate 2,000, 500, and 200 3D volume–2D slice pairs for the training, validation, and test sets, respectively. Our experimental results demonstrate that our model achieves competitive performance in terms of alignment accuracy compared to state-of-the-art deep learning-based methods (Euclidean distance of 0.93 [mm] vs. 1.86 [mm], a paired t-test with a <span><math><mi>p</mi></math></span>-value of <span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>03</mn></mrow></math></span>). Furthermore, our approach exhibits faster registration speed compared to conventional iterative methods (0.096 s vs. 1.17 s). Our end-to-end SVR model facilitates real-time head motion tracking during fMRI acquisition, ensuring reliability and robustness against uncertainties in the inputs.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102643"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-08-13DOI: 10.1016/j.compmedimag.2025.102630
Joel Jeffrey , Ashwin RajKumar , Sudhanshu Pandey , Lokesh Bathala , Phaneendra K. Yalavarthy
The major challenge faced by artificial intelligence (AI) models for medical image analysis is the class imbalance of training data and limited explainability. This study introduces a Confidence and Entropy-based Uncertainty Thresholding Algorithm (CEbUTAl), which is a novel post-processing method, designed to enhance both model performance and explainability. CEbUTAl modifies model predictions during inference, based on uncertainty and confidence measures, to improve classification in scenarios with class imbalance. CEbUTAl’s inference-time correction addresses explainability, while simultaneously improving performance, contrary to the prevailing notion that explainability necessitates a compromise in performance. The algorithm was evaluated across five medical imaging tasks: intracranial hemorrhage detection, optical coherence tomography analysis, breast cancer detection, carpal tunnel syndrome detection, and multi-class skin lesion classification. Results demonstrate that CEbUTAl improves accuracy by approximately 5% and increases sensitivity across multiple deep learning architectures, loss functions, and tasks. Comparative studies indicate that CEbUTAl outperforms state-of-the-art methods in addressing class imbalance and quantifying uncertainty. The model-agnostic, task-agnostic and post-processing nature of CEbUTAl makes it appealing for enhancing both performance and trustworthiness in medical image analysis. This study provides a generalizable approach to mitigate biases arising from class imbalance, while improving the explainability of AI models, thus increasing their utility in clinical practice.
{"title":"Inference time correction based on confidence and uncertainty for improved deep-learning model performance and explainability in medical image classification","authors":"Joel Jeffrey , Ashwin RajKumar , Sudhanshu Pandey , Lokesh Bathala , Phaneendra K. Yalavarthy","doi":"10.1016/j.compmedimag.2025.102630","DOIUrl":"10.1016/j.compmedimag.2025.102630","url":null,"abstract":"<div><div>The major challenge faced by artificial intelligence (AI) models for medical image analysis is the class imbalance of training data and limited explainability. This study introduces a Confidence and Entropy-based Uncertainty Thresholding Algorithm (CEbUTAl), which is a novel post-processing method, designed to enhance both model performance and explainability. CEbUTAl modifies model predictions during inference, based on uncertainty and confidence measures, to improve classification in scenarios with class imbalance. CEbUTAl’s inference-time correction addresses explainability, while simultaneously improving performance, contrary to the prevailing notion that explainability necessitates a compromise in performance. The algorithm was evaluated across five medical imaging tasks: intracranial hemorrhage detection, optical coherence tomography analysis, breast cancer detection, carpal tunnel syndrome detection, and multi-class skin lesion classification. Results demonstrate that CEbUTAl improves accuracy by approximately 5% and increases sensitivity across multiple deep learning architectures, loss functions, and tasks. Comparative studies indicate that CEbUTAl outperforms state-of-the-art methods in addressing class imbalance and quantifying uncertainty. The model-agnostic, task-agnostic and post-processing nature of CEbUTAl makes it appealing for enhancing both performance and trustworthiness in medical image analysis. This study provides a generalizable approach to mitigate biases arising from class imbalance, while improving the explainability of AI models, thus increasing their utility in clinical practice.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102630"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-15DOI: 10.1016/j.compmedimag.2025.102636
Zhen Liu , Bangkang Fu , Jiahui Mao , Junjie He , Jiangyue Xiang , Hongjin Li , Yunsong Peng , Bangguo Li , Rongpin Wang
This paper proposes MH-STR, a novel end-to-end framework for predicting the three-month risk of Acute Coronary Syndrome (ACS) from Coronary CT Angiography (CCTA) images. The model combines hybrid attention mechanisms with convolutional networks to capture subtle and irregular lesion patterns that are difficult to detect visually. A stage-wise transfer learning strategy helps distill general features and transfer vascular-specific knowledge. To reconcile feature scale mismatches in the dual-branch architecture, we introduce a wavelet-based multi-scale fusion module for effective integration across scales. Experiments show that MH-STR achieves an AUC of 0.834, an F1 score of 0.82, and a precision of 0.92, outperforming existing methods and highlighting its potential for improving ACS risk prediction.
{"title":"Unveiling hidden risks: A Holistically-Driven Weak Supervision framework for ultra-short-term ACS prediction using CCTA","authors":"Zhen Liu , Bangkang Fu , Jiahui Mao , Junjie He , Jiangyue Xiang , Hongjin Li , Yunsong Peng , Bangguo Li , Rongpin Wang","doi":"10.1016/j.compmedimag.2025.102636","DOIUrl":"10.1016/j.compmedimag.2025.102636","url":null,"abstract":"<div><div>This paper proposes MH-STR, a novel end-to-end framework for predicting the three-month risk of Acute Coronary Syndrome (ACS) from Coronary CT Angiography (CCTA) images. The model combines hybrid attention mechanisms with convolutional networks to capture subtle and irregular lesion patterns that are difficult to detect visually. A stage-wise transfer learning strategy helps distill general features and transfer vascular-specific knowledge. To reconcile feature scale mismatches in the dual-branch architecture, we introduce a wavelet-based multi-scale fusion module for effective integration across scales. Experiments show that MH-STR achieves an AUC of 0.834, an F1 score of 0.82, and a precision of 0.92, outperforming existing methods and highlighting its potential for improving ACS risk prediction.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102636"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145088019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-08-25DOI: 10.1016/j.compmedimag.2025.102622
Xiaoyang Zou , Zhuyuan Zhang , Derong Yu , Wenyuan Sun , Wenyong Liu , Donghua Hang , Wei Bao , Guoyan Zheng
During minimally invasive robot-assisted surgical procedures, surgeons rely on stereo endoscopes to provide image guidance. Nevertheless, the field-of-view is typically restricted owing to the limited size of the endoscope and constrained workspace. Such a visualization challenge becomes even more severe when surgical instruments are inserted into the already restricted field-of-view, where important anatomical landmarks and relevant clinical contents may become occluded by the inserted instruments. To address the challenge, in this work, we propose a novel end-to-end trainable spatial–temporal stereo information fusion network, referred as SSIFNet, for inpainting surgical videos of surgical scene under instrument occlusions in robot-assisted endoscopic surgery. The proposed SSIFNet features three essential modules including a novel optical flow-guided deformable feature propagation (OFDFP) module, a novel spatial–temporal stereo focal transformer (SFT)-based information fusion module, and a novel stereo-consistency enforcement (SE) module. These three modules work synergistically to inpaint occluded regions in the surgical scene. More importantly, SSIFNet is trained in a self-supervised manner with simulated occlusions by a novel loss function, which is designed to combine flow completion, disparity matching, cross-warping consistency, warping-consistency, image and adversarial loss terms to generate high fidelity and accurate occlusion reconstructions in both views. After training, the trained model can be applied directly to inpainting surgical videos with true instrument occlusions to generate results with not only spatial and temporal consistency but also stereo-consistency. Comprehensive quantitative and qualitative experimental results demonstrate that SSIFNet outperforms state-of-the-art (SOTA) video inpainting methods. The source code of this study will be released at https://github.com/SHAUNZXY/SSIFNet.
{"title":"SSIFNet: Spatial–temporal stereo information fusion network for self-supervised surgical video inpainting","authors":"Xiaoyang Zou , Zhuyuan Zhang , Derong Yu , Wenyuan Sun , Wenyong Liu , Donghua Hang , Wei Bao , Guoyan Zheng","doi":"10.1016/j.compmedimag.2025.102622","DOIUrl":"10.1016/j.compmedimag.2025.102622","url":null,"abstract":"<div><div>During minimally invasive robot-assisted surgical procedures, surgeons rely on stereo endoscopes to provide image guidance. Nevertheless, the field-of-view is typically restricted owing to the limited size of the endoscope and constrained workspace. Such a visualization challenge becomes even more severe when surgical instruments are inserted into the already restricted field-of-view, where important anatomical landmarks and relevant clinical contents may become occluded by the inserted instruments. To address the challenge, in this work, we propose a novel end-to-end trainable spatial–temporal stereo information fusion network, referred as SSIFNet, for inpainting surgical videos of surgical scene under instrument occlusions in robot-assisted endoscopic surgery. The proposed SSIFNet features three essential modules including a novel optical flow-guided deformable feature propagation (OFDFP) module, a novel spatial–temporal stereo focal transformer (S<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>FT)-based information fusion module, and a novel stereo-consistency enforcement (SE) module. These three modules work synergistically to inpaint occluded regions in the surgical scene. More importantly, SSIFNet is trained in a self-supervised manner with simulated occlusions by a novel loss function, which is designed to combine flow completion, disparity matching, cross-warping consistency, warping-consistency, image and adversarial loss terms to generate high fidelity and accurate occlusion reconstructions in both views. After training, the trained model can be applied directly to inpainting surgical videos with true instrument occlusions to generate results with not only spatial and temporal consistency but also stereo-consistency. Comprehensive quantitative and qualitative experimental results demonstrate that SSIFNet outperforms state-of-the-art (SOTA) video inpainting methods. The source code of this study will be released at <span><span>https://github.com/SHAUNZXY/SSIFNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102622"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144902402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-25DOI: 10.1016/j.compmedimag.2025.102649
Hao Xie , Zixun Huang , Yushen Zuo , Yakun Ju , Frank H.F. Leung , N.F. Law , Kin-Man Lam , Yong-Ping Zheng , Sai Ho Ling
Spine segmentation, based on ultrasound volume projection imaging (VPI), plays a vital role for intelligent scoliosis diagnosis in clinical applications. However, this task faces several significant challenges. Firstly, the global contextual knowledge of spines may not be well-learned if we neglect the high spatial correlation of different bone features. Secondly, the spine bones contain rich structural knowledge regarding their shapes and positions, which deserves to be encoded into the segmentation process. To address these challenges, we propose a novel scale-adaptive structure-aware network (SA2Net) for effective spine segmentation. First, we propose a scale-adaptive complementary strategy to learn the cross-dimensional long-distance correlation features for spinal images. Second, motivated by the consistency between multi-head self-attention in Transformers and semantic level affinity, we propose structure-affinity transformation to transform semantic features with class-specific affinity and combine it with a Transformer decoder for structure-aware reasoning. In addition, we adopt a feature mixing loss aggregation method to enhance model training. This method improves the robustness and accuracy of the segmentation process. The experimental results demonstrate that our SA2Net achieves superior segmentation performance compared to other state-of-the-art methods. Moreover, the adaptability of SA2Net to various backbones enhances its potential as a promising tool for advanced scoliosis diagnosis using intelligent spinal image analysis.
{"title":"SA2Net: Scale-adaptive structure-affinity transformation for spine segmentation from ultrasound volume projection imaging","authors":"Hao Xie , Zixun Huang , Yushen Zuo , Yakun Ju , Frank H.F. Leung , N.F. Law , Kin-Man Lam , Yong-Ping Zheng , Sai Ho Ling","doi":"10.1016/j.compmedimag.2025.102649","DOIUrl":"10.1016/j.compmedimag.2025.102649","url":null,"abstract":"<div><div>Spine segmentation, based on ultrasound volume projection imaging (VPI), plays a vital role for intelligent scoliosis diagnosis in clinical applications. However, this task faces several significant challenges. Firstly, the global contextual knowledge of spines may not be well-learned if we neglect the high spatial correlation of different bone features. Secondly, the spine bones contain rich structural knowledge regarding their shapes and positions, which deserves to be encoded into the segmentation process. To address these challenges, we propose a novel scale-adaptive structure-aware network (SA<sup>2</sup>Net) for effective spine segmentation. First, we propose a scale-adaptive complementary strategy to learn the cross-dimensional long-distance correlation features for spinal images. Second, motivated by the consistency between multi-head self-attention in Transformers and semantic level affinity, we propose structure-affinity transformation to transform semantic features with class-specific affinity and combine it with a Transformer decoder for structure-aware reasoning. In addition, we adopt a feature mixing loss aggregation method to enhance model training. This method improves the robustness and accuracy of the segmentation process. The experimental results demonstrate that our SA<sup>2</sup>Net achieves superior segmentation performance compared to other state-of-the-art methods. Moreover, the adaptability of SA<sup>2</sup>Net to various backbones enhances its potential as a promising tool for advanced scoliosis diagnosis using intelligent spinal image analysis.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102649"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-09-12DOI: 10.1016/j.compmedimag.2025.102645
Keyi Han , Anqi Xiao , Jie Tian , Zhenhua Hu
Objective
Blood vessel analysis is essential in various clinical fields. Detailed vascular imaging enables clinicians to assess abnormalities and make timely, effective interventions. Near-infrared-II (NIR-II, 1000–1700 nm) fluorescence imaging offers superior resolution, sensitivity, and deeper tissue visualization, making it highly promising for vascular imaging. However, deep vessels exhibit relatively low contrast, making differentiation challenging, and accurate vessel segmentation remains a difficult task.
Methods
We propose CALFNet, a context-aware local feature network based on the Mamba module, which can segment more vascular details in low-contrast regions. CALFNet overall follows a UNet-like architectures, with a ResNet-based encoder for extracting local features and a Mamba-based context-aware module in the latent space for the awareness of the global context. By incorporating the global vessel contextual information, the network can enhance segmentation performance in locally low-contrast areas, capturing finer vessel structures more effectively. Furthermore, a feature-enhance module between the encoder and decoder is designed to preserve critical historical local features from the encoder and use them to further refine the vascular details in the decoder's feature representations.
Results
We conducted experiments on two types of clinical datasets, including an NIR-II fluorescent vascular imaging dataset and retinal vessel datasets captured under visible light. The results show that CALFNet outperforms the comparison methods, demonstrating superior robustness and achieving more accurate vessel segmentation, particularly in low-contrast regions.
Conclusion and Significance
CALFNet is an effective vessel segmentation network showing better performance in accurately segmenting vessels within low-contrast regions. It can enhance the capability of NIR-II fluorescence imaging for vascular analysis, providing valuable support for clinical diagnosis and medical intervention.
目的血管分析在临床各个领域都是必不可少的。详细的血管成像使临床医生能够评估异常并及时有效地进行干预。近红外- ii (NIR-II, 1000-1700 nm)荧光成像提供卓越的分辨率,灵敏度和更深层次的组织可视化,使其在血管成像方面非常有前途。然而,深层血管的对比度相对较低,使得分化具有挑战性,并且准确的血管分割仍然是一项艰巨的任务。方法提出基于Mamba模块的上下文感知局部特征网络CALFNet,该网络可以在低对比度区域分割出更多的血管细节。CALFNet总体上遵循类似unet的架构,使用基于resnet的编码器来提取本地特征,在潜在空间中使用基于mamba的上下文感知模块来感知全局上下文。通过整合全局船舶上下文信息,网络可以增强局部低对比度区域的分割性能,更有效地捕获更精细的船舶结构。此外,在编码器和解码器之间设计了一个特征增强模块,用于保留编码器的关键历史局部特征,并使用它们进一步细化解码器特征表示中的血管细节。结果我们对两种类型的临床数据集进行了实验,包括NIR-II荧光血管成像数据集和可见光下捕获的视网膜血管数据集。结果表明,CALFNet优于对比方法,表现出优越的鲁棒性,实现了更准确的血管分割,特别是在低对比度区域。结论与意义alfnet是一种有效的血管分割网络,在低对比度区域内具有较好的血管准确分割效果。可增强NIR-II荧光成像血管分析能力,为临床诊断和医学干预提供有价值的支持。
{"title":"Mamba-based context-aware local feature network for vessel detail enhancement","authors":"Keyi Han , Anqi Xiao , Jie Tian , Zhenhua Hu","doi":"10.1016/j.compmedimag.2025.102645","DOIUrl":"10.1016/j.compmedimag.2025.102645","url":null,"abstract":"<div><h3>Objective</h3><div>Blood vessel analysis is essential in various clinical fields. Detailed vascular imaging enables clinicians to assess abnormalities and make timely, effective interventions. Near-infrared-II (NIR-II, 1000–1700 nm) fluorescence imaging offers superior resolution, sensitivity, and deeper tissue visualization, making it highly promising for vascular imaging. However, deep vessels exhibit relatively low contrast, making differentiation challenging, and accurate vessel segmentation remains a difficult task.</div></div><div><h3>Methods</h3><div>We propose CALFNet, a context-aware local feature network based on the Mamba module, which can segment more vascular details in low-contrast regions. CALFNet overall follows a UNet-like architectures, with a ResNet-based encoder for extracting local features and a Mamba-based context-aware module in the latent space for the awareness of the global context. By incorporating the global vessel contextual information, the network can enhance segmentation performance in locally low-contrast areas, capturing finer vessel structures more effectively. Furthermore, a feature-enhance module between the encoder and decoder is designed to preserve critical historical local features from the encoder and use them to further refine the vascular details in the decoder's feature representations.</div></div><div><h3>Results</h3><div>We conducted experiments on two types of clinical datasets, including an NIR-II fluorescent vascular imaging dataset and retinal vessel datasets captured under visible light. The results show that CALFNet outperforms the comparison methods, demonstrating superior robustness and achieving more accurate vessel segmentation, particularly in low-contrast regions.</div></div><div><h3>Conclusion and Significance</h3><div>CALFNet is an effective vessel segmentation network showing better performance in accurately segmenting vessels within low-contrast regions. It can enhance the capability of NIR-II fluorescence imaging for vascular analysis, providing valuable support for clinical diagnosis and medical intervention.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102645"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145118689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}