首页 > 最新文献

Computerized Medical Imaging and Graphics最新文献

英文 中文
Learning geometric and visual features for medical image segmentation with vision GNN 基于视觉GNN的医学图像分割的几何和视觉特征学习。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 Epub Date: 2026-02-03 DOI: 10.1016/j.compmedimag.2026.102720
Xinhong Li , Geng Chen , Yuanfeng Wu , Haotian Jiang , Tao Zhou , Yi Zhou , Wentao Zhu
As a fundamental task, medical image segmentation plays a crucial role in various clinical applications. In recent years, deep learning-based segmentation methods have achieved significant success. However, these methods typically represent the image and objects within it as grid-structural data, while insufficient attention is given to relationships between the objects to segment. To address this issue, we propose a novel model called MedSegViG, which consists of a hierarchical encoder based on Vision GNN (ViG) and a hybrid feature decoder. During the segmentation process, our model first represents the image as a graph and then utilizes the encoder to extract multi-level graph features and image features. Finally, our hybrid feature decoder fuses these features to generate the final segmentation map. To validate the effectiveness of the proposed model, we conducted extensive experiments on six datasets across three types of lesions: polyps, skin lesions, and retinal vessels. The results demonstrate that MedSegViG achieves superior segmentation accuracy, robustness, and generalizability.
医学图像分割作为一项基础性工作,在各种临床应用中起着至关重要的作用。近年来,基于深度学习的分割方法取得了显著的成功。然而,这些方法通常将图像和其中的对象表示为网格结构数据,而对待分割对象之间的关系给予的关注不够。为了解决这个问题,我们提出了一种新的模型MedSegViG,它由一个基于视觉GNN (ViG)的分层编码器和一个混合特征解码器组成。在分割过程中,我们的模型首先将图像表示为一个图,然后利用编码器提取多层次的图特征和图像特征。最后,我们的混合特征解码器融合这些特征来生成最终的分割图。为了验证所提出模型的有效性,我们在六个数据集上进行了广泛的实验,涉及三种类型的病变:息肉、皮肤病变和视网膜血管。结果表明,MedSegViG实现了较好的分割精度、鲁棒性和泛化性。
{"title":"Learning geometric and visual features for medical image segmentation with vision GNN","authors":"Xinhong Li ,&nbsp;Geng Chen ,&nbsp;Yuanfeng Wu ,&nbsp;Haotian Jiang ,&nbsp;Tao Zhou ,&nbsp;Yi Zhou ,&nbsp;Wentao Zhu","doi":"10.1016/j.compmedimag.2026.102720","DOIUrl":"10.1016/j.compmedimag.2026.102720","url":null,"abstract":"<div><div>As a fundamental task, medical image segmentation plays a crucial role in various clinical applications. In recent years, deep learning-based segmentation methods have achieved significant success. However, these methods typically represent the image and objects within it as grid-structural data, while insufficient attention is given to relationships between the objects to segment. To address this issue, we propose a novel model called MedSegViG, which consists of a hierarchical encoder based on Vision GNN (ViG) and a hybrid feature decoder. During the segmentation process, our model first represents the image as a graph and then utilizes the encoder to extract multi-level graph features and image features. Finally, our hybrid feature decoder fuses these features to generate the final segmentation map. To validate the effectiveness of the proposed model, we conducted extensive experiments on six datasets across three types of lesions: polyps, skin lesions, and retinal vessels. The results demonstrate that MedSegViG achieves superior segmentation accuracy, robustness, and generalizability.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102720"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MRI-based deep learning model predicts recurrent nasopharyngeal carcinoma in post-radiation nasopharyngeal necrosis 基于mri的深度学习模型预测鼻咽癌放疗后鼻咽癌坏死复发。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 Epub Date: 2026-01-29 DOI: 10.1016/j.compmedimag.2026.102711
Chao Lin , Jiong-Lin Liang , Jia Guo , Yu-Long Xie , Weidong Cheng , Guo-Heng Huang , Qing-Wen Lin , Jian-Wu Chen , Tong Xiang , Hai-Qiang Mai , Qi Yang

Background

The pretreatment identification of post-radiation nasopharyngeal necrosis (PRNN) combined with recurrent nasopharyngeal carcinoma (referred to as cancer-infiltrative PRNN) is crucial for the diagnosis and treatment of PRNN. As the first study to identify recurrent nasopharyngeal carcinoma in patients with PRNN, we aimed to develop a deep learning (DL)-based predictive model using routine MRI to distinguish cancer-infiltrative PRNN from cancer-free PRNN.

Methods

MRIs of 437 patients with PRNN were manually labeled and randomly divided into training and validation cohorts. Video Swin Transformer and Multilayer Perceptron were employed to construct the DL model. The integrated DL and clinical model (DCcombined model) and the integrated radiomics and clinical model (RCcombined model) were constructed using linear weighted fusion of the prediction results from the two models. The predictive value of each model was evaluated using the area under the curve (AUC), accuracy, sensitivity, and specificity.

Results

The DCcombined model significantly outperformed the radiologists in terms of AUC (0.83 vs. 0.60, p < 0.001), accuracy (0.78 vs. 0.60, p = 0.002), and sensitivity (0.86 vs. 0.62, p = 0.002) in the validation cohort. The DCcombined model showed the highest validation sensitivity of 0.86 (95 % CI 0.77–0.94), whereas the RCcombined model demonstrated the highest specificity of 0.88 (95 % CI 0.81–0.96).

Conclusions

Our DCcombined model based on DL can noninvasively distinguish cancer-infiltrative PRNN from cancer-free PRNN with higher AUC, accuracy, and sensitivity than those of radiologists and better sensitivity than that of the RCcombined model based on radiomics.
背景:放疗后鼻咽坏死(PRNN)合并复发性鼻咽癌(简称癌性浸润性PRNN)的预处理鉴别对PRNN的诊断和治疗至关重要。作为首个在PRNN患者中识别复发性鼻咽癌的研究,我们旨在开发一种基于深度学习(DL)的预测模型,利用常规MRI来区分癌浸润性PRNN和无癌PRNN。方法:对437例PRNN患者的mri进行人工标记,随机分为训练组和验证组。采用Video Swin Transformer和多层感知器构建深度学习模型。将两种模型的预测结果进行线性加权融合,构建DL与临床集成模型(DCcombined model)和放射组学与临床集成模型(RCcombined model)。使用曲线下面积(AUC)、准确性、敏感性和特异性评估每种模型的预测价值。结果:dc联合模型在AUC方面明显优于放射科医生(0.83 vs 0.60, p 联合模型的验证灵敏度最高,为0.86(95 % CI 0.77-0.94),而rc联合模型的特异性最高,为0.88(95 % CI 0.81-0.96)。结论:基于DL的DCcombined模型能够无创区分癌性浸润性PRNN和无癌性PRNN, AUC、准确度和敏感性均高于放射科医师,敏感性优于基于放射组学的RCcombined模型。
{"title":"MRI-based deep learning model predicts recurrent nasopharyngeal carcinoma in post-radiation nasopharyngeal necrosis","authors":"Chao Lin ,&nbsp;Jiong-Lin Liang ,&nbsp;Jia Guo ,&nbsp;Yu-Long Xie ,&nbsp;Weidong Cheng ,&nbsp;Guo-Heng Huang ,&nbsp;Qing-Wen Lin ,&nbsp;Jian-Wu Chen ,&nbsp;Tong Xiang ,&nbsp;Hai-Qiang Mai ,&nbsp;Qi Yang","doi":"10.1016/j.compmedimag.2026.102711","DOIUrl":"10.1016/j.compmedimag.2026.102711","url":null,"abstract":"<div><h3>Background</h3><div>The pretreatment identification of post-radiation nasopharyngeal necrosis (PRNN) combined with recurrent nasopharyngeal carcinoma (referred to as cancer-infiltrative PRNN) is crucial for the diagnosis and treatment of PRNN. As the first study to identify recurrent nasopharyngeal carcinoma in patients with PRNN, we aimed to develop a deep learning (DL)-based predictive model using routine MRI to distinguish cancer-infiltrative PRNN from cancer-free PRNN.</div></div><div><h3>Methods</h3><div>MRIs of 437 patients with PRNN were manually labeled and randomly divided into training and validation cohorts. Video Swin Transformer and Multilayer Perceptron were employed to construct the DL model. The integrated DL and clinical model (DC<sub>combined</sub> model) and the integrated radiomics and clinical model (RC<sub>combined</sub> model) were constructed using linear weighted fusion of the prediction results from the two models. The predictive value of each model was evaluated using the area under the curve (AUC), accuracy, sensitivity, and specificity.</div></div><div><h3>Results</h3><div>The DC<sub>combined</sub> model significantly outperformed the radiologists in terms of AUC (0.83 vs. 0.60, p &lt; 0.001), accuracy (0.78 vs. 0.60, p = 0.002), and sensitivity (0.86 vs. 0.62, p = 0.002) in the validation cohort. The DC<sub>combined</sub> model showed the highest validation sensitivity of 0.86 (95 % CI 0.77–0.94), whereas the RC<sub>combined</sub> model demonstrated the highest specificity of 0.88 (95 % CI 0.81–0.96).</div></div><div><h3>Conclusions</h3><div>Our DC<sub>combined</sub> model based on DL can noninvasively distinguish cancer-infiltrative PRNN from cancer-free PRNN with higher AUC, accuracy, and sensitivity than those of radiologists and better sensitivity than that of the RC<sub>combined</sub> model based on radiomics.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102711"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A deep learning-based automated pipeline for colorectal cancer detection in contrast-enhanced CT images 一种基于深度学习的自动管道,用于对比增强CT图像中的结直肠癌检测。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 Epub Date: 2026-01-26 DOI: 10.1016/j.compmedimag.2026.102717
Chenhui Qiu , Sarah Miller , Barathi Subramanian , Angela Ryu , Haiyu Zhang , George A. Fisher , Nigam H. Shah , John Mongan , Curtis Langlotz , Peter Poullos , Jeanne Shen
Colorectal cancer (CRC) is the third most commonly diagnosed malignancy worldwide and a leading cause of cancer-related mortality. This study aims to investigate an automatic detection pipeline for identification and localization of the primary CRC in portal venous phase contrast-enhanced CT scans, which is a crucial first step for downstream CRC staging, prognostication, and treatment planning. We propose a deep learning-based automated detection pipeline using YOLOv11 as the baseline architecture. A ResNet50 module was incorporated into the YOLOv11 backbone to enhance image feature extraction. Additionally, a scale-adaptive loss function, which introduces an adaptive coefficient and a scaling factor to adaptively measure the Intersection over Union (IoU) and center point distance for improving box regression performance, was designed to further improve detection performance. The proposed pipeline achieved a recall of 0.8092, precision of 0.8187, and F-1 score of 0.8139 for CRC detection on our in-house dataset at the patient level (inter-patient evaluation) and a recall of 0.9949, precision of 0.9894, and F-1 score of 0.9921 at the slice level (intra-patient evaluation). Validation on an external public dataset demonstrated that our pipeline, when trained on a patient-level in-house dataset, obtained a recall of 0.8283, precision of 0.8414, and F-1 score of 0.8348 and, when trained on a slice-level in-house dataset, achieved a recall of 0.6897, precision of 0.7888, and F-1 score of 0.7358, outperforming existing representative detection methods. The superior CRC detection performance on the in-house CT dataset and state-of-the-art generalization performance on the public dataset (with a 31.97 %age point improvement in detection sensitivity (recall) over the next closest state-of-the-art method), highlight the potential translational value of our pipeline for CRC clinical decision support, conditional upon validation in larger cohorts.
结直肠癌(CRC)是全球第三大最常诊断的恶性肿瘤,也是癌症相关死亡的主要原因。本研究旨在探索门静脉期增强CT扫描中原发性结直肠癌的自动识别和定位方法,这是下游结直肠癌分期、预后和治疗计划的关键第一步。我们提出了一种基于深度学习的自动检测管道,使用YOLOv11作为基准架构。在YOLOv11主干中加入ResNet50模块,增强图像特征提取。此外,设计了一种尺度自适应损失函数,引入自适应系数和比例因子自适应测量交联(Intersection over Union, IoU)和中心点距离,以提高盒回归性能,进一步提高检测性能。所提出的管道在患者水平(患者间评估)的内部数据集中实现了CRC检测的召回率为0.8092,精度为0.8187,F-1评分为0.8139,在切片水平(患者内评估)的召回率为0.9949,精度为0.9894,F-1评分为0.9921。在外部公共数据集上的验证表明,当在患者级内部数据集上训练时,我们的管道获得了0.8283的召回率,0.8414的精度和0.8348的F-1分数,当在切片级内部数据集上训练时,实现了0.6897的召回率,0.7888的精度和0.7358的F-1分数,优于现有的代表性检测方法。在内部CT数据集上优越的CRC检测性能和在公共数据集上最先进的泛化性能(检测灵敏度(召回率)比下一个最接近的最先进方法提高31.97%),突出了我们的管道在CRC临床决策支持方面的潜在转化价值,条件是在更大的队列中进行验证。
{"title":"A deep learning-based automated pipeline for colorectal cancer detection in contrast-enhanced CT images","authors":"Chenhui Qiu ,&nbsp;Sarah Miller ,&nbsp;Barathi Subramanian ,&nbsp;Angela Ryu ,&nbsp;Haiyu Zhang ,&nbsp;George A. Fisher ,&nbsp;Nigam H. Shah ,&nbsp;John Mongan ,&nbsp;Curtis Langlotz ,&nbsp;Peter Poullos ,&nbsp;Jeanne Shen","doi":"10.1016/j.compmedimag.2026.102717","DOIUrl":"10.1016/j.compmedimag.2026.102717","url":null,"abstract":"<div><div>Colorectal cancer (CRC) is the third most commonly diagnosed malignancy worldwide and a leading cause of cancer-related mortality. This study aims to investigate an automatic detection pipeline for identification and localization of the primary CRC in portal venous phase contrast-enhanced CT scans, which is a crucial first step for downstream CRC staging, prognostication, and treatment planning. We propose a deep learning-based automated detection pipeline using YOLOv11 as the baseline architecture. A ResNet50 module was incorporated into the YOLOv11 backbone to enhance image feature extraction. Additionally, a scale-adaptive loss function, which introduces an adaptive coefficient and a scaling factor to adaptively measure the Intersection over Union (IoU) and center point distance for improving box regression performance, was designed to further improve detection performance. The proposed pipeline achieved a recall of 0.8092, precision of 0.8187, and F-1 score of 0.8139 for CRC detection on our in-house dataset at the patient level (inter-patient evaluation) and a recall of 0.9949, precision of 0.9894, and F-1 score of 0.9921 at the slice level (intra-patient evaluation). Validation on an external public dataset demonstrated that our pipeline, when trained on a patient-level in-house dataset, obtained a recall of 0.8283, precision of 0.8414, and F-1 score of 0.8348 and, when trained on a slice-level in-house dataset, achieved a recall of 0.6897, precision of 0.7888, and F-1 score of 0.7358, outperforming existing representative detection methods. The superior CRC detection performance on the in-house CT dataset and state-of-the-art generalization performance on the public dataset (with a 31.97 %age point improvement in detection sensitivity (recall) over the next closest state-of-the-art method), highlight the potential translational value of our pipeline for CRC clinical decision support, conditional upon validation in larger cohorts.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102717"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XGeM: A multi-prompt foundation model for multimodal medical data generation XGeM:用于多模式医疗数据生成的多提示基础模型。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 Epub Date: 2026-01-30 DOI: 10.1016/j.compmedimag.2026.102718
Daniele Molino , Francesco Di Feola , Eliodoro Faiella , Deborah Fazzini , Domiziana Santucci , Linlin Shen , Valerio Guarrasi , Paolo Soda
The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM by first benchmarking it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we demonstrate how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at https://cosbidev.github.io/XGeM/.
人工智能在医学成像中的应用前景广阔,但仍受到数据稀缺、隐私问题以及对强大的多模式集成需求等挑战的阻碍。虽然生成建模的最新进展使高质量的合成数据生成成为可能,但现有的方法往往仅限于单模态、单向合成,因此缺乏在保持临床一致性的同时联合合成多种模态的能力。为了应对这一挑战,我们引入了XGeM,这是一个67.7亿个参数的多模态生成模型,旨在支持医疗数据模态之间灵活的任意对任意合成。XGeM通过对比学习构建了共享潜在空间,并引入了一种新的多提示训练策略,实现了对任意输入模态子集的条件反射。这种设计使模型能够适应异构的临床输入,并联合生成多个输出,同时保持语义和结构的一致性。我们首先在MIMIC-CXR数据集(用于多视图胸部x射线和放射报告生成的最先进数据集)上对XGeM与五个竞争对手进行基准测试,从而广泛验证了XGeM。其次,我们与放射科专家一起进行视觉图灵测试,以评估生成数据的真实感和临床相关性,确保与现实世界的场景保持一致。最后,我们将演示XGeM如何支持关键的医疗数据挑战,如匿名化、类不平衡和数据稀缺性,强调其作为医疗数据综合基础模型的实用性。项目页面在https://cosbidev.github.io/XGeM/。
{"title":"XGeM: A multi-prompt foundation model for multimodal medical data generation","authors":"Daniele Molino ,&nbsp;Francesco Di Feola ,&nbsp;Eliodoro Faiella ,&nbsp;Deborah Fazzini ,&nbsp;Domiziana Santucci ,&nbsp;Linlin Shen ,&nbsp;Valerio Guarrasi ,&nbsp;Paolo Soda","doi":"10.1016/j.compmedimag.2026.102718","DOIUrl":"10.1016/j.compmedimag.2026.102718","url":null,"abstract":"<div><div>The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM by first benchmarking it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we demonstrate how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at <span><span>https://cosbidev.github.io/XGeM/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102718"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A knowledge-guided and uncertainty-calibrated multimodal framework for fracture diagnosis and radiology report generation 一个知识指导和不确定性校准的多模态框架,用于骨折诊断和放射学报告生成
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 Epub Date: 2026-01-27 DOI: 10.1016/j.compmedimag.2026.102709
Riadh Bouslimi
Fracture diagnosis from radiographic imaging remains challenging, particularly in clinical settings with limited access to expert radiologists or standardized reporting practices. This work introduces UG-GraphT5 (Uncertainty-Guided Graph Transformer for Radiology Report Generation), a unified multimodal framework for joint fracture classification and uncertainty-aware radiology report generation that explicitly treats diagnostic uncertainty as a central component guiding both reasoning and clinical communication. The proposed approach integrates visual representations, structured clinical knowledge derived from SNOMED CT, Bayesian uncertainty estimation, and guided natural language generation based on ClinicalT5, enabling adaptive multimodal fusion and calibrated language output. Evaluated on three radiological datasets comprising over 80,000 expert-annotated images and reports, UG-GraphT5 achieves improved fracture classification performance (F1-score of 82.6%), strong uncertainty calibration (ECE of 2.7%), and high-quality report generation (BLEU-4 of 0.356). Qualitative analysis and a reader study involving radiology trainees and experts further confirm that generated reports appropriately reflect diagnostic confidence through uncertainty-aware lexical modulation. An optimized clinical inference profile reduces inference latency by more than 40% without compromising diagnostic accuracy, highlighting the framework’s potential for interpretable, trustworthy, and deployment-aware AI-assisted radiology in resource-constrained clinical environments.
从放射成像中诊断骨折仍然具有挑战性,特别是在临床环境中,专家放射科医生或标准化报告实践的机会有限。本研究介绍了UG-GraphT5(用于放射学报告生成的不确定性引导图形转换器),这是一个统一的多模态框架,用于关节骨折分类和不确定性感知放射学报告生成,明确将诊断不确定性作为指导推理和临床交流的核心组成部分。该方法集成了视觉表示、基于SNOMED CT的结构化临床知识、贝叶斯不确定性估计和基于ClinicalT5的引导自然语言生成,实现了自适应多模态融合和校准语言输出。UG-GraphT5在3个放射学数据集(包括8万多张专家注释的图像和报告)上进行了评估,获得了更好的骨折分类性能(f1评分为82.6%)、强不确定度校准(ECE为2.7%)和高质量的报告生成(BLEU-4为0.356)。定性分析和一项涉及放射学受训人员和专家的读者研究进一步证实,通过不确定性感知的词汇调节,生成的报告适当地反映了诊断的信心。优化的临床推理配置文件在不影响诊断准确性的情况下将推理延迟减少了40%以上,突出了该框架在资源受限的临床环境中可解释、可信赖和可部署的人工智能辅助放射学方面的潜力。
{"title":"A knowledge-guided and uncertainty-calibrated multimodal framework for fracture diagnosis and radiology report generation","authors":"Riadh Bouslimi","doi":"10.1016/j.compmedimag.2026.102709","DOIUrl":"10.1016/j.compmedimag.2026.102709","url":null,"abstract":"<div><div>Fracture diagnosis from radiographic imaging remains challenging, particularly in clinical settings with limited access to expert radiologists or standardized reporting practices. This work introduces <em>UG-GraphT5</em> (<em>Uncertainty-Guided Graph Transformer for Radiology Report Generation</em>), a unified multimodal framework for joint fracture classification and uncertainty-aware radiology report generation that explicitly treats diagnostic uncertainty as a central component guiding both reasoning and clinical communication. The proposed approach integrates visual representations, structured clinical knowledge derived from SNOMED CT, Bayesian uncertainty estimation, and guided natural language generation based on ClinicalT5, enabling adaptive multimodal fusion and calibrated language output. Evaluated on three radiological datasets comprising over 80,000 expert-annotated images and reports, UG-GraphT5 achieves improved fracture classification performance (F1-score of 82.6%), strong uncertainty calibration (ECE of 2.7%), and high-quality report generation (BLEU-4 of 0.356). Qualitative analysis and a reader study involving radiology trainees and experts further confirm that generated reports appropriately reflect diagnostic confidence through uncertainty-aware lexical modulation. An optimized clinical inference profile reduces inference latency by more than 40% without compromising diagnostic accuracy, highlighting the framework’s potential for interpretable, trustworthy, and deployment-aware AI-assisted radiology in resource-constrained clinical environments.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102709"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentiable Neural Architecture Search for medical image segmentation: A systematic review and field audit 医学图像分割的可微分神经结构搜索:系统回顾和现场审计
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 Epub Date: 2026-01-20 DOI: 10.1016/j.compmedimag.2026.102713
Emil Benedykciuk, Marcin Denkowski, Grzegorz M. Wójcik
Medical image segmentation is critical for diagnosis, treatment planning, and disease monitoring, yet differs from generic semantic segmentation due to volumetric data, modality-specific artifacts, costly and uncertain expert annotations, and domain shift across scanners and institutions. Neural Architecture Search (NAS) can automate model design, but many NAS paradigms become impractical for 3D segmentation because evaluating large numbers of candidate architectures is computationally prohibitive. Differentiable NAS (DNAS) alleviates this barrier by optimizing relaxed architectural choices with gradients in a weight-sharing supernet, making search feasible under realistic compute and memory budgets. However, DNAS introduces distinct methodological risks (e.g., optimization instability and discretization gap) and raises challenges in reproducibility and clinical deployability. We conduct a PRISMA-inspired systematic review of DNAS for medical image segmentation (multi-database screening, 2018-2025), retaining 33 papers representing 31 unique methods for quantitative analysis. Across the included studies, external validation on independent-site data is rare (10%), full code release (including search procedures) is limited (26%), and only a minority substantively addresses search stability (23%). Despite clear clinical relevance, multi-objective search that explicitly optimizes latency or memory is also uncommon (23%). We position DNAS within the broader NAS landscape, introduce a segmentation-focused taxonomy, and propose a NAS Reporting Card tailored to medical segmentation to improve transparency, comparability, and reproducibility.
医学图像分割对于诊断、治疗计划和疾病监测至关重要,但由于体积数据、模式特定的工件、昂贵且不确定的专家注释以及扫描仪和机构之间的域转移,它与一般的语义分割不同。神经结构搜索(NAS)可以实现模型设计的自动化,但由于评估大量候选结构在计算上令人望而却步,许多NAS范式对于3D分割变得不切实际。可微分NAS (DNAS)通过在权重共享超级网络中使用梯度优化轻松的体系结构选择,使搜索在实际计算和内存预算下可行,从而减轻了这一障碍。然而,dna引入了独特的方法风险(例如,优化不稳定性和离散化差距),并在可重复性和临床可部署性方面提出了挑战。我们对用于医学图像分割的dna进行了prisma启发的系统综述(多数据库筛选,2018-2025),保留了33篇论文,代表31种独特的定量分析方法。在纳入的研究中,对独立站点数据的外部验证很少(约10%),完整的代码发布(包括搜索过程)有限(约26%),只有少数研究实质性地解决了搜索稳定性(约23%)。尽管有明确的临床相关性,明确优化潜伏期或记忆的多目标搜索也不常见(约23%)。我们将dna定位在更广泛的NAS领域,引入以细分为重点的分类法,并提出针对医疗细分量身定制的NAS报告卡,以提高透明度、可比性和可重复性。
{"title":"Differentiable Neural Architecture Search for medical image segmentation: A systematic review and field audit","authors":"Emil Benedykciuk,&nbsp;Marcin Denkowski,&nbsp;Grzegorz M. Wójcik","doi":"10.1016/j.compmedimag.2026.102713","DOIUrl":"10.1016/j.compmedimag.2026.102713","url":null,"abstract":"<div><div>Medical image segmentation is critical for diagnosis, treatment planning, and disease monitoring, yet differs from generic semantic segmentation due to volumetric data, modality-specific artifacts, costly and uncertain expert annotations, and domain shift across scanners and institutions. Neural Architecture Search (NAS) can automate model design, but many NAS paradigms become impractical for 3D segmentation because evaluating large numbers of candidate architectures is computationally prohibitive. Differentiable NAS (DNAS) alleviates this barrier by optimizing relaxed architectural choices with gradients in a weight-sharing supernet, making search feasible under realistic compute and memory budgets. However, DNAS introduces distinct methodological risks (e.g., optimization instability and discretization gap) and raises challenges in reproducibility and clinical deployability. We conduct a PRISMA-inspired systematic review of DNAS for medical image segmentation (multi-database screening, 2018-2025), retaining 33 papers representing 31 unique methods for quantitative analysis. Across the included studies, external validation on independent-site data is rare (<span><math><mo>∼</mo></math></span>10%), full code release (including search procedures) is limited (<span><math><mo>∼</mo></math></span>26%), and only a minority substantively addresses search stability (<span><math><mo>∼</mo></math></span>23%). Despite clear clinical relevance, multi-objective search that explicitly optimizes latency or memory is also uncommon (<span><math><mo>∼</mo></math></span>23%). We position DNAS within the broader NAS landscape, introduce a segmentation-focused taxonomy, and propose a NAS Reporting Card tailored to medical segmentation to improve transparency, comparability, and reproducibility.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102713"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
C3MT: Confidence-Calibrated Contrastive Mean Teacher for semi-supervised medical image segmentation C3MT:半监督医学图像分割的置信度校准对比均值教师。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 Epub Date: 2026-02-06 DOI: 10.1016/j.compmedimag.2026.102721
Xianmin Wang , Mingfeng Lin , Jing Li
Semi-supervised learning is crucial for medical image segmentation due to the scarcity of labeled data. However, existing methods that combine consistency regularization and pseudo-labeling often suffer from inadequate feature representation, suboptimal subnetwork disagreement, and noisy pseudo-labels. To address these limitations, this paper proposed a novel Confidence-Calibrated Contrastive Mean Teacher (C3MT) framework. First, C3MT introduces a Contrastive Learning-based co-training strategy, where an adaptive disagreement adjustment mechanism dynamically regulates the divergence between student models. This not only preserves representation diversity but also stabilizes the training process. Second, C3MT introduces a Confidence-Calibrated and Category-Aligned uncertainty-guided region mixing strategy. The confidence-calibrated mechanism filters out unreliable pseudo-labels, whereas the category-aligned design restricts region swapping to patches of the same semantic category, preserving anatomical coherence and preventing semantic inconsistency in the mixed samples. Together, these components significantly enhance feature representation, training stability, and segmentation quality, especially in challenging low-annotation scenarios. Extensive experiments on ACDC, Synapse, and LA datasets show that C3MT consistently outperforms recent state-of-the-art methods. For example, on the ACDC dataset with 20% labeled data, C3MT achieves up to a 4.3% improvement in average Dice score and a reduction in HD95 of more than 1.0 mm compared with strong baselines. The implementation is publicly available at https://github.com/l1654485/C3MT.
由于标记数据的稀缺性,半监督学习对于医学图像分割至关重要。然而,现有的一致性正则化与伪标记相结合的方法往往存在特征表示不充分、子网不一致次优和伪标记噪声等问题。为了解决这些局限性,本文提出了一种新的可信度校准对比平均教师(C3MT)框架。首先,C3MT引入了一种基于对比学习的协同训练策略,其中自适应分歧调整机制动态调节学生模型之间的分歧。这不仅保持了表征的多样性,而且稳定了训练过程。其次,C3MT引入了一种置信度校准和类别对齐不确定性引导的区域混合策略。置信度校准机制过滤掉不可靠的伪标签,而类别对齐设计限制区域交换到相同语义类别的补丁,保持解剖一致性并防止混合样本中的语义不一致。总之,这些组件显著增强了特征表示、训练稳定性和分割质量,特别是在具有挑战性的低注释场景中。在ACDC、Synapse和LA数据集上进行的大量实验表明,C3MT始终优于最新的最先进的方法。例如,在具有20%标记数据的ACDC数据集上,与强基线相比,C3MT在平均Dice得分方面提高了4.3%,HD95降低了1.0 mm以上。该实现可在https://github.com/l1654485/C3MT上公开获得。
{"title":"C3MT: Confidence-Calibrated Contrastive Mean Teacher for semi-supervised medical image segmentation","authors":"Xianmin Wang ,&nbsp;Mingfeng Lin ,&nbsp;Jing Li","doi":"10.1016/j.compmedimag.2026.102721","DOIUrl":"10.1016/j.compmedimag.2026.102721","url":null,"abstract":"<div><div>Semi-supervised learning is crucial for medical image segmentation due to the scarcity of labeled data. However, existing methods that combine consistency regularization and pseudo-labeling often suffer from inadequate feature representation, suboptimal subnetwork disagreement, and noisy pseudo-labels. To address these limitations, this paper proposed a novel <strong>C</strong>onfidence-<strong>C</strong>alibrated <strong>C</strong>ontrastive <strong>M</strong>ean <strong>T</strong>eacher (C3MT) framework. First, C3MT introduces a Contrastive Learning-based co-training strategy, where an adaptive disagreement adjustment mechanism dynamically regulates the divergence between student models. This not only preserves representation diversity but also stabilizes the training process. Second, C3MT introduces a Confidence-Calibrated and Category-Aligned uncertainty-guided region mixing strategy. The confidence-calibrated mechanism filters out unreliable pseudo-labels, whereas the category-aligned design restricts region swapping to patches of the same semantic category, preserving anatomical coherence and preventing semantic inconsistency in the mixed samples. Together, these components significantly enhance feature representation, training stability, and segmentation quality, especially in challenging low-annotation scenarios. Extensive experiments on ACDC, Synapse, and LA datasets show that C3MT consistently outperforms recent state-of-the-art methods. For example, on the ACDC dataset with 20% labeled data, C3MT achieves up to a 4.3% improvement in average Dice score and a reduction in HD95 of more than 1.0 mm compared with strong baselines. The implementation is publicly available at <span><span>https://github.com/l1654485/C3MT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102721"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The 4D Human Embryonic Brain Atlas: Spatiotemporal atlas generation for rapid anatomical changes 4D人类胚胎脑图谱:快速解剖变化的时空图谱生成
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 Epub Date: 2026-01-13 DOI: 10.1016/j.compmedimag.2026.102702
Wietske A.P. Bastiaansen , Melek Rousian , Anton H.J. Koning , Wiro J. Niessen , Bernadette S. de Bakker , Régine P.M. Steegers-Theunissen , Stefan Klein
Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to an anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anatomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.
早期大脑发育对终身神经发育健康至关重要。然而,尽管大脑在几天的时间跨度内经历了快速的变化,但目前的临床实践提供的正常胚胎脑解剖超声知识有限。为了提供对正常大脑发育的详细了解并识别偏差,我们使用基于深度学习的方法创建了4D人类胚胎大脑图谱,用于分组注册和时空图谱生成。我们的方法引入了一个与时间相关的初始图谱,并对其偏差进行惩罚,确保在快速开发过程中保持年龄特异性解剖。该图谱是通过在妊娠8周至12周期间获得的来自402名鹿特丹围孕期队列受试者的831张3D超声图像生成并验证的。我们通过消融研究评估了该方法的有效性,该研究表明,结合时间依赖的初始图谱和惩罚产生了解剖学上准确的结果。相比之下,忽略这些适应导致了一个解剖学上不正确的地图集。与现有离体胚胎图谱的视觉比较进一步证实了我们图谱的解剖学准确性。总之,所提出的方法成功地捕获了胚胎大脑的快速解剖发育。由此产生的4D人类胚胎脑图谱为这一关键的早期生命阶段提供了独特的见解,并具有改善产前神经发育障碍的检测、预防和治疗的潜力。
{"title":"The 4D Human Embryonic Brain Atlas: Spatiotemporal atlas generation for rapid anatomical changes","authors":"Wietske A.P. Bastiaansen ,&nbsp;Melek Rousian ,&nbsp;Anton H.J. Koning ,&nbsp;Wiro J. Niessen ,&nbsp;Bernadette S. de Bakker ,&nbsp;Régine P.M. Steegers-Theunissen ,&nbsp;Stefan Klein","doi":"10.1016/j.compmedimag.2026.102702","DOIUrl":"10.1016/j.compmedimag.2026.102702","url":null,"abstract":"<div><div>Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to an anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anatomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102702"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on X-ray coronary artery branches instance segmentation and matching task x射线冠状动脉分支实例分割与匹配任务研究
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 Epub Date: 2025-12-26 DOI: 10.1016/j.compmedimag.2025.102681
Xiaodong Zhou , Huibin Wang
In the task of 3D reconstruction of X-ray coronary artery, matching vessel branches in different viewpoints is a challenging task. In this study, this task is transformed into the process of vessel branches instance segmentation and then matching branches of the same color, and an instance segmentation network (YOLO-CAVBIS) is proposed specifically for deformed and dynamic vessels. Firstly, since the left and right coronary artery branches are not easy to distinguish, a coronary artery classification dataset is produced and the left and right coronary artery arteries are classified using the YOLOv8-cls classification model, and then the classified images are fed into two parallel YOLO-CAVBIS networks for coronary artery branches instance segmentation. Finally, the branches with the same color of branches in different viewpoints are matched. The experimental results show that the accuracy of the coronary artery classification model can reach 100%, and the mAP50 of the proposed left coronary branches instance segmentation model reaches 98.4%, and the mAP50 of the proposed right coronary branches instance segmentation model reaches 99.4%. In terms of extracting deformation and dynamic vascular features, our proposed YOLO-CAVBIS network demonstrates greater specificity and superiority compared to other instance segmentation networks, and can be used as a baseline model for the task of coronary artery branches instance segmentation. Code repository: https://gitee.com/zaleman/ca_instance_segmentation, https://github.com/zaleman/ca_instance_segmentation.
在x线冠状动脉三维重建任务中,不同视点的血管分支匹配是一项具有挑战性的任务。在本研究中,将该任务转化为血管分支实例分割和相同颜色分支匹配的过程,并提出了针对变形血管和动态血管的实例分割网络(YOLO-CAVBIS)。首先,针对左右冠状动脉分支不易区分的问题,建立冠状动脉分类数据集,利用YOLOv8-cls分类模型对左右冠状动脉进行分类,然后将分类后的图像送入两个并行的yolov8 - cavbis网络进行冠状动脉分支实例分割。最后,对不同视点分支颜色相同的分支进行匹配。实验结果表明,冠状动脉分类模型的准确率可以达到100%,所提出的左冠状动脉分支实例分割模型的mAP50达到98.4%,所提出的右冠状动脉分支实例分割模型的mAP50达到99.4%。在提取血管形变和血管动态特征方面,与其他实例分割网络相比,我们提出的YOLO-CAVBIS网络具有更大的特异性和优越性,可以作为冠状动脉分支实例分割任务的基线模型。代码存储库:https://gitee.com/zaleman/ca_instance_segmentation, https://github.com/zaleman/ca_instance_segmentation。
{"title":"Research on X-ray coronary artery branches instance segmentation and matching task","authors":"Xiaodong Zhou ,&nbsp;Huibin Wang","doi":"10.1016/j.compmedimag.2025.102681","DOIUrl":"10.1016/j.compmedimag.2025.102681","url":null,"abstract":"<div><div>In the task of 3D reconstruction of X-ray coronary artery, matching vessel branches in different viewpoints is a challenging task. In this study, this task is transformed into the process of vessel branches instance segmentation and then matching branches of the same color, and an instance segmentation network (YOLO-CAVBIS) is proposed specifically for deformed and dynamic vessels. Firstly, since the left and right coronary artery branches are not easy to distinguish, a coronary artery classification dataset is produced and the left and right coronary artery arteries are classified using the YOLOv8-cls classification model, and then the classified images are fed into two parallel YOLO-CAVBIS networks for coronary artery branches instance segmentation. Finally, the branches with the same color of branches in different viewpoints are matched. The experimental results show that the accuracy of the coronary artery classification model can reach 100%, and the mAP50 of the proposed left coronary branches instance segmentation model reaches 98.4%, and the mAP50 of the proposed right coronary branches instance segmentation model reaches 99.4%. In terms of extracting deformation and dynamic vascular features, our proposed YOLO-CAVBIS network demonstrates greater specificity and superiority compared to other instance segmentation networks, and can be used as a baseline model for the task of coronary artery branches instance segmentation. Code repository: <span><span>https://gitee.com/zaleman/ca_instance_segmentation</span><svg><path></path></svg></span>, <span><span>https://github.com/zaleman/ca_instance_segmentation</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102681"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-expert deep learning framework with LLM-guided arbitration for multimodal histopathology prediction 一个多专家深度学习框架与法学硕士指导仲裁多模态组织病理学预测
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 Epub Date: 2026-01-08 DOI: 10.1016/j.compmedimag.2026.102704
Shyam Sundar Debsarkar , V.B. Surya Prasath
Recent advances in deep learning have significantly improved the accuracy of computational pathology; however conventional model ensembling strategies often lack adaptability and interpretability hindering the clinical adaptability. While multiple artificial intelligence (AI) expert models can provide complementary perspectives, simply aggregating their outputs is often insufficient for handling inter-model disagreement and delivering interpretable decisions. To address these challenges, we propose a novel multi-expert framework that integrates diverse vision-based predictors and a clinical feature-based model, with a large language model (LLM) acting as an intelligent arbitrator. By leveraging the contextual reasoning and explanation capabilities of LLMs, our architecture dynamically synthesizes insights from both imaging and clinical data, resolving model conflicts, and providing transparent, rational decisions. We validate our approach on two cancer histopathology datasets, namely the HMU-GC-HE-30K which is a gastric cancer dataset containing pathology images only, and the BCNB which is a breast cancer biopsy dataset that is multimodal — contains pathology imaging and clinical information. Our proposed multi-expert, LLM arbitrated framework (MELLMA) outperforms convolutional neural networks (CNNs), and transformers, which are currently the de facto and state-of-the-art classification ensemble models, with better overall results. We test different LLMs as arbitrators, namely LLaMA, GPT variants, and Mistral. Further, our proposed framework outperforms strong single-agent CNN/ViT baselines on the datasets, and ablations show that learned per-agent trust materially improves the arbitrator’s decisions without altering prompts or data. These experimental results demonstrate that LLM-guided arbitration consistently provides more robust and explainable performance than individual models, conventional ensembling with majority vote, uniform average, and meta-learners. The results obtained highlight the promise of LLM-driven arbitration for building transparent and extensible AI systems in digital pathology.
深度学习的最新进展显著提高了计算病理学的准确性;然而,传统的模型集成策略往往缺乏适应性和可解释性,阻碍了临床适应性。虽然多个人工智能(AI)专家模型可以提供互补的视角,但简单地汇总它们的输出通常不足以处理模型间的分歧并提供可解释的决策。为了应对这些挑战,我们提出了一种新的多专家框架,该框架集成了各种基于视觉的预测器和基于临床特征的模型,并使用大型语言模型(LLM)作为智能仲裁员。通过利用llm的上下文推理和解释能力,我们的架构动态地综合了来自成像和临床数据的见解,解决了模型冲突,并提供了透明、理性的决策。我们在两个癌症组织病理学数据集上验证了我们的方法,即HMU-GC-HE-30K,这是一个仅包含病理图像的胃癌数据集,以及BCNB,这是一个多模式的乳腺癌活检数据集,包含病理成像和临床信息。我们提出的多专家,LLM仲裁框架(MELLMA)优于卷积神经网络(cnn)和变压器,它们是目前最先进的分类集成模型,总体效果更好。我们测试了不同的llm作为仲裁者,即LLaMA、GPT变体和Mistral。此外,我们提出的框架在数据集上优于强大的单代理CNN/ViT基线,并且研究表明,学习到的每个代理信任在不改变提示或数据的情况下大大提高了仲裁员的决策。这些实验结果表明,法学硕士指导的仲裁始终比单个模型、多数投票的传统集成、统一平均和元学习器提供更稳健和可解释的性能。获得的结果突出了法学硕士驱动的仲裁在数字病理学中构建透明和可扩展的人工智能系统的承诺。
{"title":"A multi-expert deep learning framework with LLM-guided arbitration for multimodal histopathology prediction","authors":"Shyam Sundar Debsarkar ,&nbsp;V.B. Surya Prasath","doi":"10.1016/j.compmedimag.2026.102704","DOIUrl":"10.1016/j.compmedimag.2026.102704","url":null,"abstract":"<div><div>Recent advances in deep learning have significantly improved the accuracy of computational pathology; however conventional model ensembling strategies often lack adaptability and interpretability hindering the clinical adaptability. While multiple artificial intelligence (AI) expert models can provide complementary perspectives, simply aggregating their outputs is often insufficient for handling inter-model disagreement and delivering interpretable decisions. To address these challenges, we propose a novel multi-expert framework that integrates diverse vision-based predictors and a clinical feature-based model, with a large language model (LLM) acting as an intelligent arbitrator. By leveraging the contextual reasoning and explanation capabilities of LLMs, our architecture dynamically synthesizes insights from both imaging and clinical data, resolving model conflicts, and providing transparent, rational decisions. We validate our approach on two cancer histopathology datasets, namely the HMU-GC-HE-30K which is a gastric cancer dataset containing pathology images only, and the BCNB which is a breast cancer biopsy dataset that is multimodal — contains pathology imaging and clinical information. Our proposed multi-expert, LLM arbitrated framework (MELLMA) outperforms convolutional neural networks (CNNs), and transformers, which are currently the de facto and state-of-the-art classification ensemble models, with better overall results. We test different LLMs as arbitrators, namely LLaMA, GPT variants, and Mistral. Further, our proposed framework outperforms strong single-agent CNN/ViT baselines on the datasets, and ablations show that learned per-agent trust materially improves the arbitrator’s decisions without altering prompts or data. These experimental results demonstrate that LLM-guided arbitration consistently provides more robust and explainable performance than individual models, conventional ensembling with majority vote, uniform average, and meta-learners. The results obtained highlight the promise of LLM-driven arbitration for building transparent and extensible AI systems in digital pathology.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102704"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computerized Medical Imaging and Graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1