Pub Date : 2026-02-01Epub Date: 2026-02-03DOI: 10.1016/j.compmedimag.2026.102720
Xinhong Li , Geng Chen , Yuanfeng Wu , Haotian Jiang , Tao Zhou , Yi Zhou , Wentao Zhu
As a fundamental task, medical image segmentation plays a crucial role in various clinical applications. In recent years, deep learning-based segmentation methods have achieved significant success. However, these methods typically represent the image and objects within it as grid-structural data, while insufficient attention is given to relationships between the objects to segment. To address this issue, we propose a novel model called MedSegViG, which consists of a hierarchical encoder based on Vision GNN (ViG) and a hybrid feature decoder. During the segmentation process, our model first represents the image as a graph and then utilizes the encoder to extract multi-level graph features and image features. Finally, our hybrid feature decoder fuses these features to generate the final segmentation map. To validate the effectiveness of the proposed model, we conducted extensive experiments on six datasets across three types of lesions: polyps, skin lesions, and retinal vessels. The results demonstrate that MedSegViG achieves superior segmentation accuracy, robustness, and generalizability.
{"title":"Learning geometric and visual features for medical image segmentation with vision GNN","authors":"Xinhong Li , Geng Chen , Yuanfeng Wu , Haotian Jiang , Tao Zhou , Yi Zhou , Wentao Zhu","doi":"10.1016/j.compmedimag.2026.102720","DOIUrl":"10.1016/j.compmedimag.2026.102720","url":null,"abstract":"<div><div>As a fundamental task, medical image segmentation plays a crucial role in various clinical applications. In recent years, deep learning-based segmentation methods have achieved significant success. However, these methods typically represent the image and objects within it as grid-structural data, while insufficient attention is given to relationships between the objects to segment. To address this issue, we propose a novel model called MedSegViG, which consists of a hierarchical encoder based on Vision GNN (ViG) and a hybrid feature decoder. During the segmentation process, our model first represents the image as a graph and then utilizes the encoder to extract multi-level graph features and image features. Finally, our hybrid feature decoder fuses these features to generate the final segmentation map. To validate the effectiveness of the proposed model, we conducted extensive experiments on six datasets across three types of lesions: polyps, skin lesions, and retinal vessels. The results demonstrate that MedSegViG achieves superior segmentation accuracy, robustness, and generalizability.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102720"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-29DOI: 10.1016/j.compmedimag.2026.102711
Chao Lin , Jiong-Lin Liang , Jia Guo , Yu-Long Xie , Weidong Cheng , Guo-Heng Huang , Qing-Wen Lin , Jian-Wu Chen , Tong Xiang , Hai-Qiang Mai , Qi Yang
Background
The pretreatment identification of post-radiation nasopharyngeal necrosis (PRNN) combined with recurrent nasopharyngeal carcinoma (referred to as cancer-infiltrative PRNN) is crucial for the diagnosis and treatment of PRNN. As the first study to identify recurrent nasopharyngeal carcinoma in patients with PRNN, we aimed to develop a deep learning (DL)-based predictive model using routine MRI to distinguish cancer-infiltrative PRNN from cancer-free PRNN.
Methods
MRIs of 437 patients with PRNN were manually labeled and randomly divided into training and validation cohorts. Video Swin Transformer and Multilayer Perceptron were employed to construct the DL model. The integrated DL and clinical model (DCcombined model) and the integrated radiomics and clinical model (RCcombined model) were constructed using linear weighted fusion of the prediction results from the two models. The predictive value of each model was evaluated using the area under the curve (AUC), accuracy, sensitivity, and specificity.
Results
The DCcombined model significantly outperformed the radiologists in terms of AUC (0.83 vs. 0.60, p < 0.001), accuracy (0.78 vs. 0.60, p = 0.002), and sensitivity (0.86 vs. 0.62, p = 0.002) in the validation cohort. The DCcombined model showed the highest validation sensitivity of 0.86 (95 % CI 0.77–0.94), whereas the RCcombined model demonstrated the highest specificity of 0.88 (95 % CI 0.81–0.96).
Conclusions
Our DCcombined model based on DL can noninvasively distinguish cancer-infiltrative PRNN from cancer-free PRNN with higher AUC, accuracy, and sensitivity than those of radiologists and better sensitivity than that of the RCcombined model based on radiomics.
背景:放疗后鼻咽坏死(PRNN)合并复发性鼻咽癌(简称癌性浸润性PRNN)的预处理鉴别对PRNN的诊断和治疗至关重要。作为首个在PRNN患者中识别复发性鼻咽癌的研究,我们旨在开发一种基于深度学习(DL)的预测模型,利用常规MRI来区分癌浸润性PRNN和无癌PRNN。方法:对437例PRNN患者的mri进行人工标记,随机分为训练组和验证组。采用Video Swin Transformer和多层感知器构建深度学习模型。将两种模型的预测结果进行线性加权融合,构建DL与临床集成模型(DCcombined model)和放射组学与临床集成模型(RCcombined model)。使用曲线下面积(AUC)、准确性、敏感性和特异性评估每种模型的预测价值。结果:dc联合模型在AUC方面明显优于放射科医生(0.83 vs 0.60, p 联合模型的验证灵敏度最高,为0.86(95 % CI 0.77-0.94),而rc联合模型的特异性最高,为0.88(95 % CI 0.81-0.96)。结论:基于DL的DCcombined模型能够无创区分癌性浸润性PRNN和无癌性PRNN, AUC、准确度和敏感性均高于放射科医师,敏感性优于基于放射组学的RCcombined模型。
{"title":"MRI-based deep learning model predicts recurrent nasopharyngeal carcinoma in post-radiation nasopharyngeal necrosis","authors":"Chao Lin , Jiong-Lin Liang , Jia Guo , Yu-Long Xie , Weidong Cheng , Guo-Heng Huang , Qing-Wen Lin , Jian-Wu Chen , Tong Xiang , Hai-Qiang Mai , Qi Yang","doi":"10.1016/j.compmedimag.2026.102711","DOIUrl":"10.1016/j.compmedimag.2026.102711","url":null,"abstract":"<div><h3>Background</h3><div>The pretreatment identification of post-radiation nasopharyngeal necrosis (PRNN) combined with recurrent nasopharyngeal carcinoma (referred to as cancer-infiltrative PRNN) is crucial for the diagnosis and treatment of PRNN. As the first study to identify recurrent nasopharyngeal carcinoma in patients with PRNN, we aimed to develop a deep learning (DL)-based predictive model using routine MRI to distinguish cancer-infiltrative PRNN from cancer-free PRNN.</div></div><div><h3>Methods</h3><div>MRIs of 437 patients with PRNN were manually labeled and randomly divided into training and validation cohorts. Video Swin Transformer and Multilayer Perceptron were employed to construct the DL model. The integrated DL and clinical model (DC<sub>combined</sub> model) and the integrated radiomics and clinical model (RC<sub>combined</sub> model) were constructed using linear weighted fusion of the prediction results from the two models. The predictive value of each model was evaluated using the area under the curve (AUC), accuracy, sensitivity, and specificity.</div></div><div><h3>Results</h3><div>The DC<sub>combined</sub> model significantly outperformed the radiologists in terms of AUC (0.83 vs. 0.60, p < 0.001), accuracy (0.78 vs. 0.60, p = 0.002), and sensitivity (0.86 vs. 0.62, p = 0.002) in the validation cohort. The DC<sub>combined</sub> model showed the highest validation sensitivity of 0.86 (95 % CI 0.77–0.94), whereas the RC<sub>combined</sub> model demonstrated the highest specificity of 0.88 (95 % CI 0.81–0.96).</div></div><div><h3>Conclusions</h3><div>Our DC<sub>combined</sub> model based on DL can noninvasively distinguish cancer-infiltrative PRNN from cancer-free PRNN with higher AUC, accuracy, and sensitivity than those of radiologists and better sensitivity than that of the RC<sub>combined</sub> model based on radiomics.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102711"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-26DOI: 10.1016/j.compmedimag.2026.102717
Chenhui Qiu , Sarah Miller , Barathi Subramanian , Angela Ryu , Haiyu Zhang , George A. Fisher , Nigam H. Shah , John Mongan , Curtis Langlotz , Peter Poullos , Jeanne Shen
Colorectal cancer (CRC) is the third most commonly diagnosed malignancy worldwide and a leading cause of cancer-related mortality. This study aims to investigate an automatic detection pipeline for identification and localization of the primary CRC in portal venous phase contrast-enhanced CT scans, which is a crucial first step for downstream CRC staging, prognostication, and treatment planning. We propose a deep learning-based automated detection pipeline using YOLOv11 as the baseline architecture. A ResNet50 module was incorporated into the YOLOv11 backbone to enhance image feature extraction. Additionally, a scale-adaptive loss function, which introduces an adaptive coefficient and a scaling factor to adaptively measure the Intersection over Union (IoU) and center point distance for improving box regression performance, was designed to further improve detection performance. The proposed pipeline achieved a recall of 0.8092, precision of 0.8187, and F-1 score of 0.8139 for CRC detection on our in-house dataset at the patient level (inter-patient evaluation) and a recall of 0.9949, precision of 0.9894, and F-1 score of 0.9921 at the slice level (intra-patient evaluation). Validation on an external public dataset demonstrated that our pipeline, when trained on a patient-level in-house dataset, obtained a recall of 0.8283, precision of 0.8414, and F-1 score of 0.8348 and, when trained on a slice-level in-house dataset, achieved a recall of 0.6897, precision of 0.7888, and F-1 score of 0.7358, outperforming existing representative detection methods. The superior CRC detection performance on the in-house CT dataset and state-of-the-art generalization performance on the public dataset (with a 31.97 %age point improvement in detection sensitivity (recall) over the next closest state-of-the-art method), highlight the potential translational value of our pipeline for CRC clinical decision support, conditional upon validation in larger cohorts.
结直肠癌(CRC)是全球第三大最常诊断的恶性肿瘤,也是癌症相关死亡的主要原因。本研究旨在探索门静脉期增强CT扫描中原发性结直肠癌的自动识别和定位方法,这是下游结直肠癌分期、预后和治疗计划的关键第一步。我们提出了一种基于深度学习的自动检测管道,使用YOLOv11作为基准架构。在YOLOv11主干中加入ResNet50模块,增强图像特征提取。此外,设计了一种尺度自适应损失函数,引入自适应系数和比例因子自适应测量交联(Intersection over Union, IoU)和中心点距离,以提高盒回归性能,进一步提高检测性能。所提出的管道在患者水平(患者间评估)的内部数据集中实现了CRC检测的召回率为0.8092,精度为0.8187,F-1评分为0.8139,在切片水平(患者内评估)的召回率为0.9949,精度为0.9894,F-1评分为0.9921。在外部公共数据集上的验证表明,当在患者级内部数据集上训练时,我们的管道获得了0.8283的召回率,0.8414的精度和0.8348的F-1分数,当在切片级内部数据集上训练时,实现了0.6897的召回率,0.7888的精度和0.7358的F-1分数,优于现有的代表性检测方法。在内部CT数据集上优越的CRC检测性能和在公共数据集上最先进的泛化性能(检测灵敏度(召回率)比下一个最接近的最先进方法提高31.97%),突出了我们的管道在CRC临床决策支持方面的潜在转化价值,条件是在更大的队列中进行验证。
{"title":"A deep learning-based automated pipeline for colorectal cancer detection in contrast-enhanced CT images","authors":"Chenhui Qiu , Sarah Miller , Barathi Subramanian , Angela Ryu , Haiyu Zhang , George A. Fisher , Nigam H. Shah , John Mongan , Curtis Langlotz , Peter Poullos , Jeanne Shen","doi":"10.1016/j.compmedimag.2026.102717","DOIUrl":"10.1016/j.compmedimag.2026.102717","url":null,"abstract":"<div><div>Colorectal cancer (CRC) is the third most commonly diagnosed malignancy worldwide and a leading cause of cancer-related mortality. This study aims to investigate an automatic detection pipeline for identification and localization of the primary CRC in portal venous phase contrast-enhanced CT scans, which is a crucial first step for downstream CRC staging, prognostication, and treatment planning. We propose a deep learning-based automated detection pipeline using YOLOv11 as the baseline architecture. A ResNet50 module was incorporated into the YOLOv11 backbone to enhance image feature extraction. Additionally, a scale-adaptive loss function, which introduces an adaptive coefficient and a scaling factor to adaptively measure the Intersection over Union (IoU) and center point distance for improving box regression performance, was designed to further improve detection performance. The proposed pipeline achieved a recall of 0.8092, precision of 0.8187, and F-1 score of 0.8139 for CRC detection on our in-house dataset at the patient level (inter-patient evaluation) and a recall of 0.9949, precision of 0.9894, and F-1 score of 0.9921 at the slice level (intra-patient evaluation). Validation on an external public dataset demonstrated that our pipeline, when trained on a patient-level in-house dataset, obtained a recall of 0.8283, precision of 0.8414, and F-1 score of 0.8348 and, when trained on a slice-level in-house dataset, achieved a recall of 0.6897, precision of 0.7888, and F-1 score of 0.7358, outperforming existing representative detection methods. The superior CRC detection performance on the in-house CT dataset and state-of-the-art generalization performance on the public dataset (with a 31.97 %age point improvement in detection sensitivity (recall) over the next closest state-of-the-art method), highlight the potential translational value of our pipeline for CRC clinical decision support, conditional upon validation in larger cohorts.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102717"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-30DOI: 10.1016/j.compmedimag.2026.102718
Daniele Molino , Francesco Di Feola , Eliodoro Faiella , Deborah Fazzini , Domiziana Santucci , Linlin Shen , Valerio Guarrasi , Paolo Soda
The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM by first benchmarking it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we demonstrate how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at https://cosbidev.github.io/XGeM/.
{"title":"XGeM: A multi-prompt foundation model for multimodal medical data generation","authors":"Daniele Molino , Francesco Di Feola , Eliodoro Faiella , Deborah Fazzini , Domiziana Santucci , Linlin Shen , Valerio Guarrasi , Paolo Soda","doi":"10.1016/j.compmedimag.2026.102718","DOIUrl":"10.1016/j.compmedimag.2026.102718","url":null,"abstract":"<div><div>The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM by first benchmarking it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we demonstrate how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at <span><span>https://cosbidev.github.io/XGeM/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102718"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-27DOI: 10.1016/j.compmedimag.2026.102709
Riadh Bouslimi
Fracture diagnosis from radiographic imaging remains challenging, particularly in clinical settings with limited access to expert radiologists or standardized reporting practices. This work introduces UG-GraphT5 (Uncertainty-Guided Graph Transformer for Radiology Report Generation), a unified multimodal framework for joint fracture classification and uncertainty-aware radiology report generation that explicitly treats diagnostic uncertainty as a central component guiding both reasoning and clinical communication. The proposed approach integrates visual representations, structured clinical knowledge derived from SNOMED CT, Bayesian uncertainty estimation, and guided natural language generation based on ClinicalT5, enabling adaptive multimodal fusion and calibrated language output. Evaluated on three radiological datasets comprising over 80,000 expert-annotated images and reports, UG-GraphT5 achieves improved fracture classification performance (F1-score of 82.6%), strong uncertainty calibration (ECE of 2.7%), and high-quality report generation (BLEU-4 of 0.356). Qualitative analysis and a reader study involving radiology trainees and experts further confirm that generated reports appropriately reflect diagnostic confidence through uncertainty-aware lexical modulation. An optimized clinical inference profile reduces inference latency by more than 40% without compromising diagnostic accuracy, highlighting the framework’s potential for interpretable, trustworthy, and deployment-aware AI-assisted radiology in resource-constrained clinical environments.
{"title":"A knowledge-guided and uncertainty-calibrated multimodal framework for fracture diagnosis and radiology report generation","authors":"Riadh Bouslimi","doi":"10.1016/j.compmedimag.2026.102709","DOIUrl":"10.1016/j.compmedimag.2026.102709","url":null,"abstract":"<div><div>Fracture diagnosis from radiographic imaging remains challenging, particularly in clinical settings with limited access to expert radiologists or standardized reporting practices. This work introduces <em>UG-GraphT5</em> (<em>Uncertainty-Guided Graph Transformer for Radiology Report Generation</em>), a unified multimodal framework for joint fracture classification and uncertainty-aware radiology report generation that explicitly treats diagnostic uncertainty as a central component guiding both reasoning and clinical communication. The proposed approach integrates visual representations, structured clinical knowledge derived from SNOMED CT, Bayesian uncertainty estimation, and guided natural language generation based on ClinicalT5, enabling adaptive multimodal fusion and calibrated language output. Evaluated on three radiological datasets comprising over 80,000 expert-annotated images and reports, UG-GraphT5 achieves improved fracture classification performance (F1-score of 82.6%), strong uncertainty calibration (ECE of 2.7%), and high-quality report generation (BLEU-4 of 0.356). Qualitative analysis and a reader study involving radiology trainees and experts further confirm that generated reports appropriately reflect diagnostic confidence through uncertainty-aware lexical modulation. An optimized clinical inference profile reduces inference latency by more than 40% without compromising diagnostic accuracy, highlighting the framework’s potential for interpretable, trustworthy, and deployment-aware AI-assisted radiology in resource-constrained clinical environments.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102709"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-20DOI: 10.1016/j.compmedimag.2026.102713
Emil Benedykciuk, Marcin Denkowski, Grzegorz M. Wójcik
Medical image segmentation is critical for diagnosis, treatment planning, and disease monitoring, yet differs from generic semantic segmentation due to volumetric data, modality-specific artifacts, costly and uncertain expert annotations, and domain shift across scanners and institutions. Neural Architecture Search (NAS) can automate model design, but many NAS paradigms become impractical for 3D segmentation because evaluating large numbers of candidate architectures is computationally prohibitive. Differentiable NAS (DNAS) alleviates this barrier by optimizing relaxed architectural choices with gradients in a weight-sharing supernet, making search feasible under realistic compute and memory budgets. However, DNAS introduces distinct methodological risks (e.g., optimization instability and discretization gap) and raises challenges in reproducibility and clinical deployability. We conduct a PRISMA-inspired systematic review of DNAS for medical image segmentation (multi-database screening, 2018-2025), retaining 33 papers representing 31 unique methods for quantitative analysis. Across the included studies, external validation on independent-site data is rare (10%), full code release (including search procedures) is limited (26%), and only a minority substantively addresses search stability (23%). Despite clear clinical relevance, multi-objective search that explicitly optimizes latency or memory is also uncommon (23%). We position DNAS within the broader NAS landscape, introduce a segmentation-focused taxonomy, and propose a NAS Reporting Card tailored to medical segmentation to improve transparency, comparability, and reproducibility.
{"title":"Differentiable Neural Architecture Search for medical image segmentation: A systematic review and field audit","authors":"Emil Benedykciuk, Marcin Denkowski, Grzegorz M. Wójcik","doi":"10.1016/j.compmedimag.2026.102713","DOIUrl":"10.1016/j.compmedimag.2026.102713","url":null,"abstract":"<div><div>Medical image segmentation is critical for diagnosis, treatment planning, and disease monitoring, yet differs from generic semantic segmentation due to volumetric data, modality-specific artifacts, costly and uncertain expert annotations, and domain shift across scanners and institutions. Neural Architecture Search (NAS) can automate model design, but many NAS paradigms become impractical for 3D segmentation because evaluating large numbers of candidate architectures is computationally prohibitive. Differentiable NAS (DNAS) alleviates this barrier by optimizing relaxed architectural choices with gradients in a weight-sharing supernet, making search feasible under realistic compute and memory budgets. However, DNAS introduces distinct methodological risks (e.g., optimization instability and discretization gap) and raises challenges in reproducibility and clinical deployability. We conduct a PRISMA-inspired systematic review of DNAS for medical image segmentation (multi-database screening, 2018-2025), retaining 33 papers representing 31 unique methods for quantitative analysis. Across the included studies, external validation on independent-site data is rare (<span><math><mo>∼</mo></math></span>10%), full code release (including search procedures) is limited (<span><math><mo>∼</mo></math></span>26%), and only a minority substantively addresses search stability (<span><math><mo>∼</mo></math></span>23%). Despite clear clinical relevance, multi-objective search that explicitly optimizes latency or memory is also uncommon (<span><math><mo>∼</mo></math></span>23%). We position DNAS within the broader NAS landscape, introduce a segmentation-focused taxonomy, and propose a NAS Reporting Card tailored to medical segmentation to improve transparency, comparability, and reproducibility.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102713"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-02-06DOI: 10.1016/j.compmedimag.2026.102721
Xianmin Wang , Mingfeng Lin , Jing Li
Semi-supervised learning is crucial for medical image segmentation due to the scarcity of labeled data. However, existing methods that combine consistency regularization and pseudo-labeling often suffer from inadequate feature representation, suboptimal subnetwork disagreement, and noisy pseudo-labels. To address these limitations, this paper proposed a novel Confidence-Calibrated Contrastive Mean Teacher (C3MT) framework. First, C3MT introduces a Contrastive Learning-based co-training strategy, where an adaptive disagreement adjustment mechanism dynamically regulates the divergence between student models. This not only preserves representation diversity but also stabilizes the training process. Second, C3MT introduces a Confidence-Calibrated and Category-Aligned uncertainty-guided region mixing strategy. The confidence-calibrated mechanism filters out unreliable pseudo-labels, whereas the category-aligned design restricts region swapping to patches of the same semantic category, preserving anatomical coherence and preventing semantic inconsistency in the mixed samples. Together, these components significantly enhance feature representation, training stability, and segmentation quality, especially in challenging low-annotation scenarios. Extensive experiments on ACDC, Synapse, and LA datasets show that C3MT consistently outperforms recent state-of-the-art methods. For example, on the ACDC dataset with 20% labeled data, C3MT achieves up to a 4.3% improvement in average Dice score and a reduction in HD95 of more than 1.0 mm compared with strong baselines. The implementation is publicly available at https://github.com/l1654485/C3MT.
{"title":"C3MT: Confidence-Calibrated Contrastive Mean Teacher for semi-supervised medical image segmentation","authors":"Xianmin Wang , Mingfeng Lin , Jing Li","doi":"10.1016/j.compmedimag.2026.102721","DOIUrl":"10.1016/j.compmedimag.2026.102721","url":null,"abstract":"<div><div>Semi-supervised learning is crucial for medical image segmentation due to the scarcity of labeled data. However, existing methods that combine consistency regularization and pseudo-labeling often suffer from inadequate feature representation, suboptimal subnetwork disagreement, and noisy pseudo-labels. To address these limitations, this paper proposed a novel <strong>C</strong>onfidence-<strong>C</strong>alibrated <strong>C</strong>ontrastive <strong>M</strong>ean <strong>T</strong>eacher (C3MT) framework. First, C3MT introduces a Contrastive Learning-based co-training strategy, where an adaptive disagreement adjustment mechanism dynamically regulates the divergence between student models. This not only preserves representation diversity but also stabilizes the training process. Second, C3MT introduces a Confidence-Calibrated and Category-Aligned uncertainty-guided region mixing strategy. The confidence-calibrated mechanism filters out unreliable pseudo-labels, whereas the category-aligned design restricts region swapping to patches of the same semantic category, preserving anatomical coherence and preventing semantic inconsistency in the mixed samples. Together, these components significantly enhance feature representation, training stability, and segmentation quality, especially in challenging low-annotation scenarios. Extensive experiments on ACDC, Synapse, and LA datasets show that C3MT consistently outperforms recent state-of-the-art methods. For example, on the ACDC dataset with 20% labeled data, C3MT achieves up to a 4.3% improvement in average Dice score and a reduction in HD95 of more than 1.0 mm compared with strong baselines. The implementation is publicly available at <span><span>https://github.com/l1654485/C3MT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102721"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-13DOI: 10.1016/j.compmedimag.2026.102702
Wietske A.P. Bastiaansen , Melek Rousian , Anton H.J. Koning , Wiro J. Niessen , Bernadette S. de Bakker , Régine P.M. Steegers-Theunissen , Stefan Klein
Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to an anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anatomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.
{"title":"The 4D Human Embryonic Brain Atlas: Spatiotemporal atlas generation for rapid anatomical changes","authors":"Wietske A.P. Bastiaansen , Melek Rousian , Anton H.J. Koning , Wiro J. Niessen , Bernadette S. de Bakker , Régine P.M. Steegers-Theunissen , Stefan Klein","doi":"10.1016/j.compmedimag.2026.102702","DOIUrl":"10.1016/j.compmedimag.2026.102702","url":null,"abstract":"<div><div>Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to an anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anatomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102702"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-26DOI: 10.1016/j.compmedimag.2025.102681
Xiaodong Zhou , Huibin Wang
In the task of 3D reconstruction of X-ray coronary artery, matching vessel branches in different viewpoints is a challenging task. In this study, this task is transformed into the process of vessel branches instance segmentation and then matching branches of the same color, and an instance segmentation network (YOLO-CAVBIS) is proposed specifically for deformed and dynamic vessels. Firstly, since the left and right coronary artery branches are not easy to distinguish, a coronary artery classification dataset is produced and the left and right coronary artery arteries are classified using the YOLOv8-cls classification model, and then the classified images are fed into two parallel YOLO-CAVBIS networks for coronary artery branches instance segmentation. Finally, the branches with the same color of branches in different viewpoints are matched. The experimental results show that the accuracy of the coronary artery classification model can reach 100%, and the mAP50 of the proposed left coronary branches instance segmentation model reaches 98.4%, and the mAP50 of the proposed right coronary branches instance segmentation model reaches 99.4%. In terms of extracting deformation and dynamic vascular features, our proposed YOLO-CAVBIS network demonstrates greater specificity and superiority compared to other instance segmentation networks, and can be used as a baseline model for the task of coronary artery branches instance segmentation. Code repository: https://gitee.com/zaleman/ca_instance_segmentation, https://github.com/zaleman/ca_instance_segmentation.
{"title":"Research on X-ray coronary artery branches instance segmentation and matching task","authors":"Xiaodong Zhou , Huibin Wang","doi":"10.1016/j.compmedimag.2025.102681","DOIUrl":"10.1016/j.compmedimag.2025.102681","url":null,"abstract":"<div><div>In the task of 3D reconstruction of X-ray coronary artery, matching vessel branches in different viewpoints is a challenging task. In this study, this task is transformed into the process of vessel branches instance segmentation and then matching branches of the same color, and an instance segmentation network (YOLO-CAVBIS) is proposed specifically for deformed and dynamic vessels. Firstly, since the left and right coronary artery branches are not easy to distinguish, a coronary artery classification dataset is produced and the left and right coronary artery arteries are classified using the YOLOv8-cls classification model, and then the classified images are fed into two parallel YOLO-CAVBIS networks for coronary artery branches instance segmentation. Finally, the branches with the same color of branches in different viewpoints are matched. The experimental results show that the accuracy of the coronary artery classification model can reach 100%, and the mAP50 of the proposed left coronary branches instance segmentation model reaches 98.4%, and the mAP50 of the proposed right coronary branches instance segmentation model reaches 99.4%. In terms of extracting deformation and dynamic vascular features, our proposed YOLO-CAVBIS network demonstrates greater specificity and superiority compared to other instance segmentation networks, and can be used as a baseline model for the task of coronary artery branches instance segmentation. Code repository: <span><span>https://gitee.com/zaleman/ca_instance_segmentation</span><svg><path></path></svg></span>, <span><span>https://github.com/zaleman/ca_instance_segmentation</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102681"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-08DOI: 10.1016/j.compmedimag.2026.102704
Shyam Sundar Debsarkar , V.B. Surya Prasath
Recent advances in deep learning have significantly improved the accuracy of computational pathology; however conventional model ensembling strategies often lack adaptability and interpretability hindering the clinical adaptability. While multiple artificial intelligence (AI) expert models can provide complementary perspectives, simply aggregating their outputs is often insufficient for handling inter-model disagreement and delivering interpretable decisions. To address these challenges, we propose a novel multi-expert framework that integrates diverse vision-based predictors and a clinical feature-based model, with a large language model (LLM) acting as an intelligent arbitrator. By leveraging the contextual reasoning and explanation capabilities of LLMs, our architecture dynamically synthesizes insights from both imaging and clinical data, resolving model conflicts, and providing transparent, rational decisions. We validate our approach on two cancer histopathology datasets, namely the HMU-GC-HE-30K which is a gastric cancer dataset containing pathology images only, and the BCNB which is a breast cancer biopsy dataset that is multimodal — contains pathology imaging and clinical information. Our proposed multi-expert, LLM arbitrated framework (MELLMA) outperforms convolutional neural networks (CNNs), and transformers, which are currently the de facto and state-of-the-art classification ensemble models, with better overall results. We test different LLMs as arbitrators, namely LLaMA, GPT variants, and Mistral. Further, our proposed framework outperforms strong single-agent CNN/ViT baselines on the datasets, and ablations show that learned per-agent trust materially improves the arbitrator’s decisions without altering prompts or data. These experimental results demonstrate that LLM-guided arbitration consistently provides more robust and explainable performance than individual models, conventional ensembling with majority vote, uniform average, and meta-learners. The results obtained highlight the promise of LLM-driven arbitration for building transparent and extensible AI systems in digital pathology.
{"title":"A multi-expert deep learning framework with LLM-guided arbitration for multimodal histopathology prediction","authors":"Shyam Sundar Debsarkar , V.B. Surya Prasath","doi":"10.1016/j.compmedimag.2026.102704","DOIUrl":"10.1016/j.compmedimag.2026.102704","url":null,"abstract":"<div><div>Recent advances in deep learning have significantly improved the accuracy of computational pathology; however conventional model ensembling strategies often lack adaptability and interpretability hindering the clinical adaptability. While multiple artificial intelligence (AI) expert models can provide complementary perspectives, simply aggregating their outputs is often insufficient for handling inter-model disagreement and delivering interpretable decisions. To address these challenges, we propose a novel multi-expert framework that integrates diverse vision-based predictors and a clinical feature-based model, with a large language model (LLM) acting as an intelligent arbitrator. By leveraging the contextual reasoning and explanation capabilities of LLMs, our architecture dynamically synthesizes insights from both imaging and clinical data, resolving model conflicts, and providing transparent, rational decisions. We validate our approach on two cancer histopathology datasets, namely the HMU-GC-HE-30K which is a gastric cancer dataset containing pathology images only, and the BCNB which is a breast cancer biopsy dataset that is multimodal — contains pathology imaging and clinical information. Our proposed multi-expert, LLM arbitrated framework (MELLMA) outperforms convolutional neural networks (CNNs), and transformers, which are currently the de facto and state-of-the-art classification ensemble models, with better overall results. We test different LLMs as arbitrators, namely LLaMA, GPT variants, and Mistral. Further, our proposed framework outperforms strong single-agent CNN/ViT baselines on the datasets, and ablations show that learned per-agent trust materially improves the arbitrator’s decisions without altering prompts or data. These experimental results demonstrate that LLM-guided arbitration consistently provides more robust and explainable performance than individual models, conventional ensembling with majority vote, uniform average, and meta-learners. The results obtained highlight the promise of LLM-driven arbitration for building transparent and extensible AI systems in digital pathology.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102704"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}