首页 > 最新文献

Computerized Medical Imaging and Graphics最新文献

英文 中文
C3MT: Confidence-Calibrated Contrastive Mean Teacher for semi-supervised medical image segmentation. C3MT:半监督医学图像分割的置信度校准对比均值教师。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-06 DOI: 10.1016/j.compmedimag.2026.102721
Xianmin Wang, Mingfeng Lin, Jing Li

Semi-supervised learning is crucial for medical image segmentation due to the scarcity of labeled data. However, existing methods that combine consistency regularization and pseudo-labeling often suffer from inadequate feature representation, suboptimal subnetwork disagreement, and noisy pseudo-labels. To address these limitations, this paper proposed a novel Confidence-Calibrated Contrastive Mean Teacher (C3MT) framework. First, C3MT introduces a Contrastive Learning-based co-training strategy, where an adaptive disagreement adjustment mechanism dynamically regulates the divergence between student models. This not only preserves representation diversity but also stabilizes the training process. Second, C3MT introduces a Confidence-Calibrated and Category-Aligned uncertainty-guided region mixing strategy. The confidence-calibrated mechanism filters out unreliable pseudo-labels, whereas the category-aligned design restricts region swapping to patches of the same semantic category, preserving anatomical coherence and preventing semantic inconsistency in the mixed samples. Together, these components significantly enhance feature representation, training stability, and segmentation quality, especially in challenging low-annotation scenarios. Extensive experiments on ACDC, Synapse, and LA datasets show that C3MT consistently outperforms recent state-of-the-art methods. For example, on the ACDC dataset with 20% labeled data, C3MT achieves up to a 4.3% improvement in average Dice score and a reduction in HD95 of more than 1.0 mm compared with strong baselines. The implementation is publicly available at https://github.com/l1654485/C3MT.

由于标记数据的稀缺性,半监督学习对于医学图像分割至关重要。然而,现有的一致性正则化与伪标记相结合的方法往往存在特征表示不充分、子网不一致次优和伪标记噪声等问题。为了解决这些局限性,本文提出了一种新的可信度校准对比平均教师(C3MT)框架。首先,C3MT引入了一种基于对比学习的协同训练策略,其中自适应分歧调整机制动态调节学生模型之间的分歧。这不仅保持了表征的多样性,而且稳定了训练过程。其次,C3MT引入了一种置信度校准和类别对齐不确定性引导的区域混合策略。置信度校准机制过滤掉不可靠的伪标签,而类别对齐设计限制区域交换到相同语义类别的补丁,保持解剖一致性并防止混合样本中的语义不一致。总之,这些组件显著增强了特征表示、训练稳定性和分割质量,特别是在具有挑战性的低注释场景中。在ACDC、Synapse和LA数据集上进行的大量实验表明,C3MT始终优于最新的最先进的方法。例如,在具有20%标记数据的ACDC数据集上,与强基线相比,C3MT在平均Dice得分方面提高了4.3%,HD95降低了1.0 mm以上。该实现可在https://github.com/l1654485/C3MT上公开获得。
{"title":"C3MT: Confidence-Calibrated Contrastive Mean Teacher for semi-supervised medical image segmentation.","authors":"Xianmin Wang, Mingfeng Lin, Jing Li","doi":"10.1016/j.compmedimag.2026.102721","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102721","url":null,"abstract":"<p><p>Semi-supervised learning is crucial for medical image segmentation due to the scarcity of labeled data. However, existing methods that combine consistency regularization and pseudo-labeling often suffer from inadequate feature representation, suboptimal subnetwork disagreement, and noisy pseudo-labels. To address these limitations, this paper proposed a novel Confidence-Calibrated Contrastive Mean Teacher (C3MT) framework. First, C3MT introduces a Contrastive Learning-based co-training strategy, where an adaptive disagreement adjustment mechanism dynamically regulates the divergence between student models. This not only preserves representation diversity but also stabilizes the training process. Second, C3MT introduces a Confidence-Calibrated and Category-Aligned uncertainty-guided region mixing strategy. The confidence-calibrated mechanism filters out unreliable pseudo-labels, whereas the category-aligned design restricts region swapping to patches of the same semantic category, preserving anatomical coherence and preventing semantic inconsistency in the mixed samples. Together, these components significantly enhance feature representation, training stability, and segmentation quality, especially in challenging low-annotation scenarios. Extensive experiments on ACDC, Synapse, and LA datasets show that C3MT consistently outperforms recent state-of-the-art methods. For example, on the ACDC dataset with 20% labeled data, C3MT achieves up to a 4.3% improvement in average Dice score and a reduction in HD95 of more than 1.0 mm compared with strong baselines. The implementation is publicly available at https://github.com/l1654485/C3MT.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102721"},"PeriodicalIF":4.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VFA-Net3D: A zero-shot vascular flow-guided 3D network for brain vessel segmentation in acute ischemic stroke VFA-Net3D:用于急性缺血性脑卒中脑血管分割的零射血管流引导3D网络
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-06 DOI: 10.1016/j.compmedimag.2026.102712
Asim Zaman , Mazen M. Yassin , Rashid Khan , Faizan Ahmad , Guangtao Huang , Yongkang Shi , Ziran Chen , Yan Kang
Timely intervention in acute ischemic stroke (AIS) is critical, with approximately 1.9 million neurons lost per minute. Clinical time-of-flight magnetic resonance angiography (TOF-MRA) protocols, designed for rapid acquisition within a few minutes, introduce substantial domain shifts compared to high-resolution research datasets. These include reduced resolution, a lower signal-to-noise ratio, partial-volume effects, and motion artifacts, which are compounded by stroke-specific vascular abnormalities. Conventional segmentation models often fail under such conditions due to limited robustness to domain variability. We propose Vascular Flow-Attention Network (VFA-Net), a fully 3D deep neural network designed for zero-shot vessel segmentation in AIS, enabling knowledge transfer from annotated healthy TOF-MRA scans without retraining on pathological or low-quality clinical data. The architecture integrates five novel modules: (1) Flow-Pattern Attention for vascular continuity; (2) Multi-Scale Context Aggregation using dilated attention; (3) Vascular Flow Feature Refinement for adaptive attention enhancement; (4) Boundary-Guided Skip for precise boundary delineation; and (5) Flow-Refined Up-sampling to recover fine vessel details. The proposed model, trained exclusively on healthy TOF-MRA scans from four vendors (1.5 T and 3 T), was evaluated across three experimental configurations and significantly outperformed state-of-the-art models. By explicitly encoding vascular domain knowledge (continuity, boundaries, and flow topology) into its architecture, VFA-Net3D functions as a knowledge-guided system, enabling robust zero-shot generalization across clinical domains. VFA-Net3D presents a robust and clinically deployable solution for AIS vessel segmentation, supporting faster and more accurate diagnosis and treatment planning.
急性缺血性卒中(AIS)的及时干预至关重要,每分钟约有190万个神经元丢失。临床飞行时间磁共振血管造影(TOF-MRA)方案设计用于在几分钟内快速获取,与高分辨率研究数据集相比,引入了大量的域转移。这些包括分辨率降低、信噪比降低、部分体积效应和运动伪影,这些都是由中风特异性血管异常引起的。传统的分割模型由于对域变异性的鲁棒性有限,常常在这种情况下失效。我们提出了血管流注意网络(VFA-Net),这是一种全3D深度神经网络,专为AIS中的零采样血管分割而设计,使知识能够从带注释的健康TOF-MRA扫描中转移,而无需对病理或低质量临床数据进行再训练。该架构集成了五个新模块:(1)血流模式关注血管连续性;(2)基于扩张注意的多尺度语境聚合;(3)基于自适应注意力增强的血管血流特征细化;(4)边界引导跳过,实现边界精确划定;(5)流动细化上采样,以恢复精细的容器细节。所提出的模型专门针对四家供应商(1.5 T和3 T)的健康TOF-MRA扫描进行了训练,并在三种实验配置中进行了评估,结果明显优于最先进的模型。通过将血管领域知识(连续性、边界和血流拓扑)明确编码到其架构中,VFA-Net3D可以作为一个知识引导系统,实现跨临床领域的稳健零采样泛化。VFA-Net3D为AIS血管分割提供了一个强大的临床部署解决方案,支持更快、更准确的诊断和治疗计划。
{"title":"VFA-Net3D: A zero-shot vascular flow-guided 3D network for brain vessel segmentation in acute ischemic stroke","authors":"Asim Zaman ,&nbsp;Mazen M. Yassin ,&nbsp;Rashid Khan ,&nbsp;Faizan Ahmad ,&nbsp;Guangtao Huang ,&nbsp;Yongkang Shi ,&nbsp;Ziran Chen ,&nbsp;Yan Kang","doi":"10.1016/j.compmedimag.2026.102712","DOIUrl":"10.1016/j.compmedimag.2026.102712","url":null,"abstract":"<div><div>Timely intervention in acute ischemic stroke (AIS) is critical, with approximately 1.9 million neurons lost per minute. Clinical time-of-flight magnetic resonance angiography (TOF-MRA) protocols, designed for rapid acquisition within a few minutes, introduce substantial domain shifts compared to high-resolution research datasets. These include reduced resolution, a lower signal-to-noise ratio, partial-volume effects, and motion artifacts, which are compounded by stroke-specific vascular abnormalities. Conventional segmentation models often fail under such conditions due to limited robustness to domain variability. We propose Vascular Flow-Attention Network (VFA-Net), a fully 3D deep neural network designed for zero-shot vessel segmentation in AIS, enabling knowledge transfer from annotated healthy TOF-MRA scans without retraining on pathological or low-quality clinical data. The architecture integrates five novel modules: (1) Flow-Pattern Attention for vascular continuity; (2) Multi-Scale Context Aggregation using dilated attention; (3) Vascular Flow Feature Refinement for adaptive attention enhancement; (4) Boundary-Guided Skip for precise boundary delineation; and (5) Flow-Refined Up-sampling to recover fine vessel details. The proposed model, trained exclusively on healthy TOF-MRA scans from four vendors (1.5 T and 3 T), was evaluated across three experimental configurations and significantly outperformed state-of-the-art models. By explicitly encoding vascular domain knowledge (continuity, boundaries, and flow topology) into its architecture, VFA-Net3D functions as a knowledge-guided system, enabling robust zero-shot generalization across clinical domains. VFA-Net3D presents a robust and clinically deployable solution for AIS vessel segmentation, supporting faster and more accurate diagnosis and treatment planning.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"129 ","pages":"Article 102712"},"PeriodicalIF":4.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interpretable machine learning framework with data-informed imaging biomarkers for diagnosis and prediction of Alzheimer's disease. 一个可解释的机器学习框架,具有数据知情的成像生物标志物,用于阿尔茨海默病的诊断和预测。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-06 DOI: 10.1016/j.compmedimag.2026.102722
Wenjie Kang, Bo Li, Lize C Jiskoot, Peter Paul De Deyn, Geert Jan Biessels, Huiberdina L Koek, Jurgen A H R Claassen, Huub A M Middelkoop, Wiesje M van der Flier, Willemijn J Jansen, Stefan Klein, Esther E Bron

Machine learning methods based on imaging and other clinical data have shown great potential for improving the early and accurate diagnosis of Alzheimer's disease (AD). However, for most deep learning models, especially those including high-dimensional imaging data, the decision-making process remains largely opaque which limits clinical applicability. Explainable Boosting Machines (EBMs) are inherently interpretable machine learning models, but are typically applied to low-dimensional data. In this study, we propose an interpretable machine learning framework that integrates data-driven feature extraction based on Convolutional Neural Networks (CNNs) with the intrinsic transparency of EBMs for AD diagnosis and prediction. The framework enables interpretation at both the group-level and individual-level by identifying imaging biomarkers contributing to predictions. We validated the framework on the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort, achieving an area-under-the-curve (AUC) of 0.969 for AD vs. control classification and 0.750 for MCI conversion prediction. External validation was performed on an independent cohort, yielding AUCs of 0.871 for AD vs. subjective cognitive decline (SCD) classification and 0.666 for MCI conversion prediction. The proposed framework achieves performance comparable to state-of-the-art black-box models while offering transparent decision-making, a critical requirement for clinical translation. Our code is available at: https://gitlab.com/radiology/neuro/interpretable_ad_classification.

基于成像和其他临床数据的机器学习方法在提高阿尔茨海默病(AD)的早期和准确诊断方面显示出巨大的潜力。然而,对于大多数深度学习模型,特别是那些包含高维成像数据的模型,决策过程在很大程度上仍然是不透明的,这限制了临床应用。可解释的增强机器(EBMs)本质上是可解释的机器学习模型,但通常应用于低维数据。在这项研究中,我们提出了一个可解释的机器学习框架,该框架将基于卷积神经网络(cnn)的数据驱动特征提取与EBMs的内在透明度相结合,用于AD的诊断和预测。通过识别有助于预测的成像生物标志物,该框架能够在群体水平和个人水平上进行解释。我们在阿尔茨海默病神经影像学倡议(ADNI)队列中验证了该框架,实现了AD与对照分类的曲线下面积(AUC)为0.969,MCI转换预测为0.750。在独立队列中进行外部验证,得出AD与主观认知衰退(SCD)分类的auc为0.871,MCI转换预测的auc为0.666。所提出的框架实现了与最先进的黑箱模型相当的性能,同时提供透明的决策,这是临床翻译的关键要求。我们的代码可在:https://gitlab.com/radiology/neuro/interpretable_ad_classification。
{"title":"An interpretable machine learning framework with data-informed imaging biomarkers for diagnosis and prediction of Alzheimer's disease.","authors":"Wenjie Kang, Bo Li, Lize C Jiskoot, Peter Paul De Deyn, Geert Jan Biessels, Huiberdina L Koek, Jurgen A H R Claassen, Huub A M Middelkoop, Wiesje M van der Flier, Willemijn J Jansen, Stefan Klein, Esther E Bron","doi":"10.1016/j.compmedimag.2026.102722","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102722","url":null,"abstract":"<p><p>Machine learning methods based on imaging and other clinical data have shown great potential for improving the early and accurate diagnosis of Alzheimer's disease (AD). However, for most deep learning models, especially those including high-dimensional imaging data, the decision-making process remains largely opaque which limits clinical applicability. Explainable Boosting Machines (EBMs) are inherently interpretable machine learning models, but are typically applied to low-dimensional data. In this study, we propose an interpretable machine learning framework that integrates data-driven feature extraction based on Convolutional Neural Networks (CNNs) with the intrinsic transparency of EBMs for AD diagnosis and prediction. The framework enables interpretation at both the group-level and individual-level by identifying imaging biomarkers contributing to predictions. We validated the framework on the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort, achieving an area-under-the-curve (AUC) of 0.969 for AD vs. control classification and 0.750 for MCI conversion prediction. External validation was performed on an independent cohort, yielding AUCs of 0.871 for AD vs. subjective cognitive decline (SCD) classification and 0.666 for MCI conversion prediction. The proposed framework achieves performance comparable to state-of-the-art black-box models while offering transparent decision-making, a critical requirement for clinical translation. Our code is available at: https://gitlab.com/radiology/neuro/interpretable_ad_classification.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102722"},"PeriodicalIF":4.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HF-VLP: A multimodal vision-language pre-trained model for diagnosing heart failure. HF-VLP:用于诊断心力衰竭的多模态视觉语言预训练模型。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-04 DOI: 10.1016/j.compmedimag.2026.102719
Huiting Ma, Dengao Li, Guiji Zhao, Li Liu, Jian Fu, Xiaole Fan, Zhe Zhang, Yuchen Liang

A considerable increase in the incidence of heart failure (HF) has recently posed major challenges to the medical field, underscoring the urgent need for early detection and intervention. Medical vision-and-language pretraining learns general representation from medical images and texts and shows great prospects in multimodal data-based diagnosis. During model pretraining, patient images may contain multiple symptoms simultaneously and medical research faces a considerable challenge from noisy labels owing to factors such as differences among experts and machine-extracted labels. Furthermore, parameter-efficient fine-tuning (PEFT) is important for promoting model development. To address these challenges, we developed a multimodal vision-language pretrained model for HF, called HF-VLP. In particular, label calibration loss is adopted to solve the labeling noise problem of multisource pretraining data. During pretraining, the data labels are calibrated in real time using the correlation between the labels and prediction confidence. Considering the efficient migratability of the pretrained model, a PEFT method called decomposed singular value weight-decomposed low-rank adaptation is developed. It can learn the downstream data distribution quickly by fine-tuning <1% of the parameters to obtain a better diagnosis rate than zero-shot. Simultaneously, the developed model fuses chest X-ray image features and radiology report features through the dynamic fusion graph module, enhancing the interaction and expression ability of multimodal information. The validity of the model is verified on multiple medical datasets. The average AUC of multisymptom prediction in the Open-I dataset and the hospital dataset PPL-CXR reached 83.67% and 91.28%, respectively. The developed model can accurately classify the symptoms of patients, thereby assisting doctors in diagnosis.

最近,心力衰竭(HF)发病率的大幅增加对医疗领域构成了重大挑战,强调了早期发现和干预的迫切需要。医学视觉语言预训练从医学图像和文本中学习通用表征,在基于多模态数据的诊断中具有广阔的应用前景。在模型预训练过程中,患者图像可能同时包含多个症状,由于专家之间的差异和机器提取的标签等因素,医学研究面临着来自噪声标签的相当大的挑战。此外,参数有效微调(PEFT)对于促进模型的发展非常重要。为了应对这些挑战,我们开发了一种用于高频的多模态视觉语言预训练模型,称为HF- vlp。特别地,采用标签校准损失来解决多源预训练数据的标记噪声问题。在预训练过程中,利用标签与预测置信度之间的相关性实时校准数据标签。考虑到预训练模型的可迁移性,提出了一种分解奇异值权重分解低秩自适应PEFT方法。它可以通过微调快速学习下游数据分布
{"title":"HF-VLP: A multimodal vision-language pre-trained model for diagnosing heart failure.","authors":"Huiting Ma, Dengao Li, Guiji Zhao, Li Liu, Jian Fu, Xiaole Fan, Zhe Zhang, Yuchen Liang","doi":"10.1016/j.compmedimag.2026.102719","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102719","url":null,"abstract":"<p><p>A considerable increase in the incidence of heart failure (HF) has recently posed major challenges to the medical field, underscoring the urgent need for early detection and intervention. Medical vision-and-language pretraining learns general representation from medical images and texts and shows great prospects in multimodal data-based diagnosis. During model pretraining, patient images may contain multiple symptoms simultaneously and medical research faces a considerable challenge from noisy labels owing to factors such as differences among experts and machine-extracted labels. Furthermore, parameter-efficient fine-tuning (PEFT) is important for promoting model development. To address these challenges, we developed a multimodal vision-language pretrained model for HF, called HF-VLP. In particular, label calibration loss is adopted to solve the labeling noise problem of multisource pretraining data. During pretraining, the data labels are calibrated in real time using the correlation between the labels and prediction confidence. Considering the efficient migratability of the pretrained model, a PEFT method called decomposed singular value weight-decomposed low-rank adaptation is developed. It can learn the downstream data distribution quickly by fine-tuning <1% of the parameters to obtain a better diagnosis rate than zero-shot. Simultaneously, the developed model fuses chest X-ray image features and radiology report features through the dynamic fusion graph module, enhancing the interaction and expression ability of multimodal information. The validity of the model is verified on multiple medical datasets. The average AUC of multisymptom prediction in the Open-I dataset and the hospital dataset PPL-CXR reached 83.67% and 91.28%, respectively. The developed model can accurately classify the symptoms of patients, thereby assisting doctors in diagnosis.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102719"},"PeriodicalIF":4.9,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning geometric and visual features for medical image segmentation with vision GNN. 基于视觉GNN的医学图像分割的几何和视觉特征学习。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-03 DOI: 10.1016/j.compmedimag.2026.102720
Xinhong Li, Geng Chen, Yuanfeng Wu, Haotian Jiang, Tao Zhou, Yi Zhou, Wentao Zhu

As a fundamental task, medical image segmentation plays a crucial role in various clinical applications. In recent years, deep learning-based segmentation methods have achieved significant success. However, these methods typically represent the image and objects within it as grid-structural data, while insufficient attention is given to relationships between the objects to segment. To address this issue, we propose a novel model called MedSegViG, which consists of a hierarchical encoder based on Vision GNN (ViG) and a hybrid feature decoder. During the segmentation process, our model first represents the image as a graph and then utilizes the encoder to extract multi-level graph features and image features. Finally, our hybrid feature decoder fuses these features to generate the final segmentation map. To validate the effectiveness of the proposed model, we conducted extensive experiments on six datasets across three types of lesions: polyps, skin lesions, and retinal vessels. The results demonstrate that MedSegViG achieves superior segmentation accuracy, robustness, and generalizability.

医学图像分割作为一项基础性工作,在各种临床应用中起着至关重要的作用。近年来,基于深度学习的分割方法取得了显著的成功。然而,这些方法通常将图像和其中的对象表示为网格结构数据,而对待分割对象之间的关系给予的关注不够。为了解决这个问题,我们提出了一种新的模型MedSegViG,它由一个基于视觉GNN (ViG)的分层编码器和一个混合特征解码器组成。在分割过程中,我们的模型首先将图像表示为一个图,然后利用编码器提取多层次的图特征和图像特征。最后,我们的混合特征解码器融合这些特征来生成最终的分割图。为了验证所提出模型的有效性,我们在六个数据集上进行了广泛的实验,涉及三种类型的病变:息肉、皮肤病变和视网膜血管。结果表明,MedSegViG实现了较好的分割精度、鲁棒性和泛化性。
{"title":"Learning geometric and visual features for medical image segmentation with vision GNN.","authors":"Xinhong Li, Geng Chen, Yuanfeng Wu, Haotian Jiang, Tao Zhou, Yi Zhou, Wentao Zhu","doi":"10.1016/j.compmedimag.2026.102720","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102720","url":null,"abstract":"<p><p>As a fundamental task, medical image segmentation plays a crucial role in various clinical applications. In recent years, deep learning-based segmentation methods have achieved significant success. However, these methods typically represent the image and objects within it as grid-structural data, while insufficient attention is given to relationships between the objects to segment. To address this issue, we propose a novel model called MedSegViG, which consists of a hierarchical encoder based on Vision GNN (ViG) and a hybrid feature decoder. During the segmentation process, our model first represents the image as a graph and then utilizes the encoder to extract multi-level graph features and image features. Finally, our hybrid feature decoder fuses these features to generate the final segmentation map. To validate the effectiveness of the proposed model, we conducted extensive experiments on six datasets across three types of lesions: polyps, skin lesions, and retinal vessels. The results demonstrate that MedSegViG achieves superior segmentation accuracy, robustness, and generalizability.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102720"},"PeriodicalIF":4.9,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A knowledge-guided and uncertainty-calibrated multimodal framework for fracture diagnosis and radiology report generation 一个知识指导和不确定性校准的多模态框架,用于骨折诊断和放射学报告生成
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 DOI: 10.1016/j.compmedimag.2026.102709
Riadh Bouslimi
Fracture diagnosis from radiographic imaging remains challenging, particularly in clinical settings with limited access to expert radiologists or standardized reporting practices. This work introduces UG-GraphT5 (Uncertainty-Guided Graph Transformer for Radiology Report Generation), a unified multimodal framework for joint fracture classification and uncertainty-aware radiology report generation that explicitly treats diagnostic uncertainty as a central component guiding both reasoning and clinical communication. The proposed approach integrates visual representations, structured clinical knowledge derived from SNOMED CT, Bayesian uncertainty estimation, and guided natural language generation based on ClinicalT5, enabling adaptive multimodal fusion and calibrated language output. Evaluated on three radiological datasets comprising over 80,000 expert-annotated images and reports, UG-GraphT5 achieves improved fracture classification performance (F1-score of 82.6%), strong uncertainty calibration (ECE of 2.7%), and high-quality report generation (BLEU-4 of 0.356). Qualitative analysis and a reader study involving radiology trainees and experts further confirm that generated reports appropriately reflect diagnostic confidence through uncertainty-aware lexical modulation. An optimized clinical inference profile reduces inference latency by more than 40% without compromising diagnostic accuracy, highlighting the framework’s potential for interpretable, trustworthy, and deployment-aware AI-assisted radiology in resource-constrained clinical environments.
从放射成像中诊断骨折仍然具有挑战性,特别是在临床环境中,专家放射科医生或标准化报告实践的机会有限。本研究介绍了UG-GraphT5(用于放射学报告生成的不确定性引导图形转换器),这是一个统一的多模态框架,用于关节骨折分类和不确定性感知放射学报告生成,明确将诊断不确定性作为指导推理和临床交流的核心组成部分。该方法集成了视觉表示、基于SNOMED CT的结构化临床知识、贝叶斯不确定性估计和基于ClinicalT5的引导自然语言生成,实现了自适应多模态融合和校准语言输出。UG-GraphT5在3个放射学数据集(包括8万多张专家注释的图像和报告)上进行了评估,获得了更好的骨折分类性能(f1评分为82.6%)、强不确定度校准(ECE为2.7%)和高质量的报告生成(BLEU-4为0.356)。定性分析和一项涉及放射学受训人员和专家的读者研究进一步证实,通过不确定性感知的词汇调节,生成的报告适当地反映了诊断的信心。优化的临床推理配置文件在不影响诊断准确性的情况下将推理延迟减少了40%以上,突出了该框架在资源受限的临床环境中可解释、可信赖和可部署的人工智能辅助放射学方面的潜力。
{"title":"A knowledge-guided and uncertainty-calibrated multimodal framework for fracture diagnosis and radiology report generation","authors":"Riadh Bouslimi","doi":"10.1016/j.compmedimag.2026.102709","DOIUrl":"10.1016/j.compmedimag.2026.102709","url":null,"abstract":"<div><div>Fracture diagnosis from radiographic imaging remains challenging, particularly in clinical settings with limited access to expert radiologists or standardized reporting practices. This work introduces <em>UG-GraphT5</em> (<em>Uncertainty-Guided Graph Transformer for Radiology Report Generation</em>), a unified multimodal framework for joint fracture classification and uncertainty-aware radiology report generation that explicitly treats diagnostic uncertainty as a central component guiding both reasoning and clinical communication. The proposed approach integrates visual representations, structured clinical knowledge derived from SNOMED CT, Bayesian uncertainty estimation, and guided natural language generation based on ClinicalT5, enabling adaptive multimodal fusion and calibrated language output. Evaluated on three radiological datasets comprising over 80,000 expert-annotated images and reports, UG-GraphT5 achieves improved fracture classification performance (F1-score of 82.6%), strong uncertainty calibration (ECE of 2.7%), and high-quality report generation (BLEU-4 of 0.356). Qualitative analysis and a reader study involving radiology trainees and experts further confirm that generated reports appropriately reflect diagnostic confidence through uncertainty-aware lexical modulation. An optimized clinical inference profile reduces inference latency by more than 40% without compromising diagnostic accuracy, highlighting the framework’s potential for interpretable, trustworthy, and deployment-aware AI-assisted radiology in resource-constrained clinical environments.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102709"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Initial evaluation of a mixed-reality system for image-guided navigation during percutaneous liver tumor ablation 经皮肝肿瘤消融过程中图像引导导航混合现实系统的初步评估
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-02-01 DOI: 10.1016/j.compmedimag.2026.102714
Dominik Spinczyk , Grzegorz Rosiak , Jarosław Żyłkowski , Krzysztof Milczarek , Dariusz Konecki , Karol Zaczkowski , Agata Tomaszewska , Łukasz Przepióra , Anna Wolińska-Sołtys , Piotr Sperka , Dawid Hajda , Ewa Piętka
Minimally invasive ablation is a challenge for contemporary interventional radiology. This study aimed to investigate the feasibility of utilizing a mixed-reality system for this type of treatment. A HoloLens mixed reality and optical tracking system, which supports diagnosis, planning, and procedure implementation, was used for percutaneous liver tumor ablation. The system differentiated pathological liver changes at the diagnostic stage, allowing for the selection of the entry point and target during planning. Meanwhile, it provided a real-time fusion of intraoperative ultrasound images with a pre-operative hologram during the procedure. Additionally, the collision detection module enabled the detection of collisions between the ablative needle and anatomical structures, utilizing the actual needle trajectory. The system was evaluated in 11 patients with cancerous liver lesions. The mean accuracy of target point registration, selected at the planning stage, was 2.8 mm during the procedure's supporting stage. Additionally, operator depth perception improved, the effective needle trajectory was shortened, and the radiation dose was reduced for both the patient and the operator due to improved visibility of the needle within the patient’s body. A generally improved understanding of the mutual spatial relationship between anatomical structures was observed compared to the classical two-dimensional view, along with improved depth perception of the operating field. An additional advantage indicated by the operators was the real-time highlighting of anatomical structures susceptible to damage by the needle trajectory, such as blood vessels, bile ducts, and the lungs, which lowers the risk of complications.
微创消融是当代介入放射学面临的挑战。本研究旨在探讨利用混合现实系统进行此类治疗的可行性。HoloLens混合现实和光学跟踪系统,支持诊断、计划和程序实施,用于经皮肝肿瘤消融。该系统在诊断阶段对病理肝脏变化进行区分,允许在计划过程中选择切入点和靶点。同时,它提供了术中超声图像与术中全息图的实时融合。此外,碰撞检测模块能够检测烧蚀针与解剖结构之间的碰撞,利用实际的烧蚀针轨迹。该系统在11例肝癌患者中进行了评估。在计划阶段选择的目标点配准的平均精度在程序支持阶段为2.8 mm。此外,由于针头在患者体内的可见性提高,操作者的深度感知能力提高,有效针头轨迹缩短,患者和操作者的辐射剂量都降低了。与经典的二维视图相比,观察到对解剖结构之间相互空间关系的理解普遍提高,同时对手术视野的深度感知也有所提高。操作员指出的另一个优势是实时突出易受针头轨迹损伤的解剖结构,如血管、胆管和肺部,从而降低了并发症的风险。
{"title":"Initial evaluation of a mixed-reality system for image-guided navigation during percutaneous liver tumor ablation","authors":"Dominik Spinczyk ,&nbsp;Grzegorz Rosiak ,&nbsp;Jarosław Żyłkowski ,&nbsp;Krzysztof Milczarek ,&nbsp;Dariusz Konecki ,&nbsp;Karol Zaczkowski ,&nbsp;Agata Tomaszewska ,&nbsp;Łukasz Przepióra ,&nbsp;Anna Wolińska-Sołtys ,&nbsp;Piotr Sperka ,&nbsp;Dawid Hajda ,&nbsp;Ewa Piętka","doi":"10.1016/j.compmedimag.2026.102714","DOIUrl":"10.1016/j.compmedimag.2026.102714","url":null,"abstract":"<div><div>Minimally invasive ablation is a challenge for contemporary interventional radiology. This study aimed to investigate the feasibility of utilizing a mixed-reality system for this type of treatment. A HoloLens mixed reality and optical tracking system, which supports diagnosis, planning, and procedure implementation, was used for percutaneous liver tumor ablation. The system differentiated pathological liver changes at the diagnostic stage, allowing for the selection of the entry point and target during planning. Meanwhile, it provided a real-time fusion of intraoperative ultrasound images with a pre-operative hologram during the procedure. Additionally, the collision detection module enabled the detection of collisions between the ablative needle and anatomical structures, utilizing the actual needle trajectory. The system was evaluated in 11 patients with cancerous liver lesions. The mean accuracy of target point registration, selected at the planning stage, was 2.8 mm during the procedure's supporting stage. Additionally, operator depth perception improved, the effective needle trajectory was shortened, and the radiation dose was reduced for both the patient and the operator due to improved visibility of the needle within the patient’s body. A generally improved understanding of the mutual spatial relationship between anatomical structures was observed compared to the classical two-dimensional view, along with improved depth perception of the operating field. An additional advantage indicated by the operators was the real-time highlighting of anatomical structures susceptible to damage by the needle trajectory, such as blood vessels, bile ducts, and the lungs, which lowers the risk of complications.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102714"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XGeM: A multi-prompt foundation model for multimodal medical data generation. XGeM:用于多模式医疗数据生成的多提示基础模型。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-30 DOI: 10.1016/j.compmedimag.2026.102718
Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda

The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM by first benchmarking it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we demonstrate how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at https://cosbidev.github.io/XGeM/.

人工智能在医学成像中的应用前景广阔,但仍受到数据稀缺、隐私问题以及对强大的多模式集成需求等挑战的阻碍。虽然生成建模的最新进展使高质量的合成数据生成成为可能,但现有的方法往往仅限于单模态、单向合成,因此缺乏在保持临床一致性的同时联合合成多种模态的能力。为了应对这一挑战,我们引入了XGeM,这是一个67.7亿个参数的多模态生成模型,旨在支持医疗数据模态之间灵活的任意对任意合成。XGeM通过对比学习构建了共享潜在空间,并引入了一种新的多提示训练策略,实现了对任意输入模态子集的条件反射。这种设计使模型能够适应异构的临床输入,并联合生成多个输出,同时保持语义和结构的一致性。我们首先在MIMIC-CXR数据集(用于多视图胸部x射线和放射报告生成的最先进数据集)上对XGeM与五个竞争对手进行基准测试,从而广泛验证了XGeM。其次,我们与放射科专家一起进行视觉图灵测试,以评估生成数据的真实感和临床相关性,确保与现实世界的场景保持一致。最后,我们将演示XGeM如何支持关键的医疗数据挑战,如匿名化、类不平衡和数据稀缺性,强调其作为医疗数据综合基础模型的实用性。项目页面在https://cosbidev.github.io/XGeM/。
{"title":"XGeM: A multi-prompt foundation model for multimodal medical data generation.","authors":"Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda","doi":"10.1016/j.compmedimag.2026.102718","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102718","url":null,"abstract":"<p><p>The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM by first benchmarking it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we demonstrate how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at https://cosbidev.github.io/XGeM/.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102718"},"PeriodicalIF":4.9,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MRI-based deep learning model predicts recurrent nasopharyngeal carcinoma in post-radiation nasopharyngeal necrosis. 基于mri的深度学习模型预测鼻咽癌放疗后鼻咽癌坏死复发。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-29 DOI: 10.1016/j.compmedimag.2026.102711
Chao Lin, Jiong-Lin Liang, Jia Guo, Yu-Long Xie, Weidong Cheng, Guo-Heng Huang, Qing-Wen Lin, Jian-Wu Chen, Tong Xiang, Hai-Qiang Mai, Qi Yang

Background: The pretreatment identification of post-radiation nasopharyngeal necrosis (PRNN) combined with recurrent nasopharyngeal carcinoma (referred to as cancer-infiltrative PRNN) is crucial for the diagnosis and treatment of PRNN. As the first study to identify recurrent nasopharyngeal carcinoma in patients with PRNN, we aimed to develop a deep learning (DL)-based predictive model using routine MRI to distinguish cancer-infiltrative PRNN from cancer-free PRNN.

Methods: MRIs of 437 patients with PRNN were manually labeled and randomly divided into training and validation cohorts. Video Swin Transformer and Multilayer Perceptron were employed to construct the DL model. The integrated DL and clinical model (DCcombined model) and the integrated radiomics and clinical model (RCcombined model) were constructed using linear weighted fusion of the prediction results from the two models. The predictive value of each model was evaluated using the area under the curve (AUC), accuracy, sensitivity, and specificity.

Results: The DCcombined model significantly outperformed the radiologists in terms of AUC (0.83 vs. 0.60, p < 0.001), accuracy (0.78 vs. 0.60, p = 0.002), and sensitivity (0.86 vs. 0.62, p = 0.002) in the validation cohort. The DCcombined model showed the highest validation sensitivity of 0.86 (95 % CI 0.77-0.94), whereas the RCcombined model demonstrated the highest specificity of 0.88 (95 % CI 0.81-0.96).

Conclusions: Our DCcombined model based on DL can noninvasively distinguish cancer-infiltrative PRNN from cancer-free PRNN with higher AUC, accuracy, and sensitivity than those of radiologists and better sensitivity than that of the RCcombined model based on radiomics.

背景:放疗后鼻咽坏死(PRNN)合并复发性鼻咽癌(简称癌性浸润性PRNN)的预处理鉴别对PRNN的诊断和治疗至关重要。作为首个在PRNN患者中识别复发性鼻咽癌的研究,我们旨在开发一种基于深度学习(DL)的预测模型,利用常规MRI来区分癌浸润性PRNN和无癌PRNN。方法:对437例PRNN患者的mri进行人工标记,随机分为训练组和验证组。采用Video Swin Transformer和多层感知器构建深度学习模型。将两种模型的预测结果进行线性加权融合,构建DL与临床集成模型(DCcombined model)和放射组学与临床集成模型(RCcombined model)。使用曲线下面积(AUC)、准确性、敏感性和特异性评估每种模型的预测价值。结果:dc联合模型在AUC方面明显优于放射科医生(0.83 vs 0.60, p 联合模型的验证灵敏度最高,为0.86(95 % CI 0.77-0.94),而rc联合模型的特异性最高,为0.88(95 % CI 0.81-0.96)。结论:基于DL的DCcombined模型能够无创区分癌性浸润性PRNN和无癌性PRNN, AUC、准确度和敏感性均高于放射科医师,敏感性优于基于放射组学的RCcombined模型。
{"title":"MRI-based deep learning model predicts recurrent nasopharyngeal carcinoma in post-radiation nasopharyngeal necrosis.","authors":"Chao Lin, Jiong-Lin Liang, Jia Guo, Yu-Long Xie, Weidong Cheng, Guo-Heng Huang, Qing-Wen Lin, Jian-Wu Chen, Tong Xiang, Hai-Qiang Mai, Qi Yang","doi":"10.1016/j.compmedimag.2026.102711","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102711","url":null,"abstract":"<p><strong>Background: </strong>The pretreatment identification of post-radiation nasopharyngeal necrosis (PRNN) combined with recurrent nasopharyngeal carcinoma (referred to as cancer-infiltrative PRNN) is crucial for the diagnosis and treatment of PRNN. As the first study to identify recurrent nasopharyngeal carcinoma in patients with PRNN, we aimed to develop a deep learning (DL)-based predictive model using routine MRI to distinguish cancer-infiltrative PRNN from cancer-free PRNN.</p><p><strong>Methods: </strong>MRIs of 437 patients with PRNN were manually labeled and randomly divided into training and validation cohorts. Video Swin Transformer and Multilayer Perceptron were employed to construct the DL model. The integrated DL and clinical model (DC<sub>combined</sub> model) and the integrated radiomics and clinical model (RC<sub>combined</sub> model) were constructed using linear weighted fusion of the prediction results from the two models. The predictive value of each model was evaluated using the area under the curve (AUC), accuracy, sensitivity, and specificity.</p><p><strong>Results: </strong>The DC<sub>combined</sub> model significantly outperformed the radiologists in terms of AUC (0.83 vs. 0.60, p < 0.001), accuracy (0.78 vs. 0.60, p = 0.002), and sensitivity (0.86 vs. 0.62, p = 0.002) in the validation cohort. The DC<sub>combined</sub> model showed the highest validation sensitivity of 0.86 (95 % CI 0.77-0.94), whereas the RC<sub>combined</sub> model demonstrated the highest specificity of 0.88 (95 % CI 0.81-0.96).</p><p><strong>Conclusions: </strong>Our DC<sub>combined</sub> model based on DL can noninvasively distinguish cancer-infiltrative PRNN from cancer-free PRNN with higher AUC, accuracy, and sensitivity than those of radiologists and better sensitivity than that of the RC<sub>combined</sub> model based on radiomics.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102711"},"PeriodicalIF":4.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A deep learning-based automated pipeline for colorectal cancer detection in contrast-enhanced CT images. 一种基于深度学习的自动管道,用于对比增强CT图像中的结直肠癌检测。
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-26 DOI: 10.1016/j.compmedimag.2026.102717
Chenhui Qiu, Sarah Miller, Barathi Subramanian, Angela Ryu, Haiyu Zhang, George A Fisher, Nigam H Shah, John Mongan, Curtis Langlotz, Peter Poullos, Jeanne Shen

Colorectal cancer (CRC) is the third most commonly diagnosed malignancy worldwide and a leading cause of cancer-related mortality. This study aims to investigate an automatic detection pipeline for identification and localization of the primary CRC in portal venous phase contrast-enhanced CT scans, which is a crucial first step for downstream CRC staging, prognostication, and treatment planning. We propose a deep learning-based automated detection pipeline using YOLOv11 as the baseline architecture. A ResNet50 module was incorporated into the YOLOv11 backbone to enhance image feature extraction. Additionally, a scale-adaptive loss function, which introduces an adaptive coefficient and a scaling factor to adaptively measure the Intersection over Union (IoU) and center point distance for improving box regression performance, was designed to further improve detection performance. The proposed pipeline achieved a recall of 0.8092, precision of 0.8187, and F-1 score of 0.8139 for CRC detection on our in-house dataset at the patient level (inter-patient evaluation) and a recall of 0.9949, precision of 0.9894, and F-1 score of 0.9921 at the slice level (intra-patient evaluation). Validation on an external public dataset demonstrated that our pipeline, when trained on a patient-level in-house dataset, obtained a recall of 0.8283, precision of 0.8414, and F-1 score of 0.8348 and, when trained on a slice-level in-house dataset, achieved a recall of 0.6897, precision of 0.7888, and F-1 score of 0.7358, outperforming existing representative detection methods. The superior CRC detection performance on the in-house CT dataset and state-of-the-art generalization performance on the public dataset (with a 31.97 %age point improvement in detection sensitivity (recall) over the next closest state-of-the-art method), highlight the potential translational value of our pipeline for CRC clinical decision support, conditional upon validation in larger cohorts.

结直肠癌(CRC)是全球第三大最常诊断的恶性肿瘤,也是癌症相关死亡的主要原因。本研究旨在探索门静脉期增强CT扫描中原发性结直肠癌的自动识别和定位方法,这是下游结直肠癌分期、预后和治疗计划的关键第一步。我们提出了一种基于深度学习的自动检测管道,使用YOLOv11作为基准架构。在YOLOv11主干中加入ResNet50模块,增强图像特征提取。此外,设计了一种尺度自适应损失函数,引入自适应系数和比例因子自适应测量交联(Intersection over Union, IoU)和中心点距离,以提高盒回归性能,进一步提高检测性能。所提出的管道在患者水平(患者间评估)的内部数据集中实现了CRC检测的召回率为0.8092,精度为0.8187,F-1评分为0.8139,在切片水平(患者内评估)的召回率为0.9949,精度为0.9894,F-1评分为0.9921。在外部公共数据集上的验证表明,当在患者级内部数据集上训练时,我们的管道获得了0.8283的召回率,0.8414的精度和0.8348的F-1分数,当在切片级内部数据集上训练时,实现了0.6897的召回率,0.7888的精度和0.7358的F-1分数,优于现有的代表性检测方法。在内部CT数据集上优越的CRC检测性能和在公共数据集上最先进的泛化性能(检测灵敏度(召回率)比下一个最接近的最先进方法提高31.97%),突出了我们的管道在CRC临床决策支持方面的潜在转化价值,条件是在更大的队列中进行验证。
{"title":"A deep learning-based automated pipeline for colorectal cancer detection in contrast-enhanced CT images.","authors":"Chenhui Qiu, Sarah Miller, Barathi Subramanian, Angela Ryu, Haiyu Zhang, George A Fisher, Nigam H Shah, John Mongan, Curtis Langlotz, Peter Poullos, Jeanne Shen","doi":"10.1016/j.compmedimag.2026.102717","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102717","url":null,"abstract":"<p><p>Colorectal cancer (CRC) is the third most commonly diagnosed malignancy worldwide and a leading cause of cancer-related mortality. This study aims to investigate an automatic detection pipeline for identification and localization of the primary CRC in portal venous phase contrast-enhanced CT scans, which is a crucial first step for downstream CRC staging, prognostication, and treatment planning. We propose a deep learning-based automated detection pipeline using YOLOv11 as the baseline architecture. A ResNet50 module was incorporated into the YOLOv11 backbone to enhance image feature extraction. Additionally, a scale-adaptive loss function, which introduces an adaptive coefficient and a scaling factor to adaptively measure the Intersection over Union (IoU) and center point distance for improving box regression performance, was designed to further improve detection performance. The proposed pipeline achieved a recall of 0.8092, precision of 0.8187, and F-1 score of 0.8139 for CRC detection on our in-house dataset at the patient level (inter-patient evaluation) and a recall of 0.9949, precision of 0.9894, and F-1 score of 0.9921 at the slice level (intra-patient evaluation). Validation on an external public dataset demonstrated that our pipeline, when trained on a patient-level in-house dataset, obtained a recall of 0.8283, precision of 0.8414, and F-1 score of 0.8348 and, when trained on a slice-level in-house dataset, achieved a recall of 0.6897, precision of 0.7888, and F-1 score of 0.7358, outperforming existing representative detection methods. The superior CRC detection performance on the in-house CT dataset and state-of-the-art generalization performance on the public dataset (with a 31.97 %age point improvement in detection sensitivity (recall) over the next closest state-of-the-art method), highlight the potential translational value of our pipeline for CRC clinical decision support, conditional upon validation in larger cohorts.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102717"},"PeriodicalIF":4.9,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computerized Medical Imaging and Graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1