Pub Date : 2026-02-06DOI: 10.1016/j.compmedimag.2026.102721
Xianmin Wang, Mingfeng Lin, Jing Li
Semi-supervised learning is crucial for medical image segmentation due to the scarcity of labeled data. However, existing methods that combine consistency regularization and pseudo-labeling often suffer from inadequate feature representation, suboptimal subnetwork disagreement, and noisy pseudo-labels. To address these limitations, this paper proposed a novel Confidence-Calibrated Contrastive Mean Teacher (C3MT) framework. First, C3MT introduces a Contrastive Learning-based co-training strategy, where an adaptive disagreement adjustment mechanism dynamically regulates the divergence between student models. This not only preserves representation diversity but also stabilizes the training process. Second, C3MT introduces a Confidence-Calibrated and Category-Aligned uncertainty-guided region mixing strategy. The confidence-calibrated mechanism filters out unreliable pseudo-labels, whereas the category-aligned design restricts region swapping to patches of the same semantic category, preserving anatomical coherence and preventing semantic inconsistency in the mixed samples. Together, these components significantly enhance feature representation, training stability, and segmentation quality, especially in challenging low-annotation scenarios. Extensive experiments on ACDC, Synapse, and LA datasets show that C3MT consistently outperforms recent state-of-the-art methods. For example, on the ACDC dataset with 20% labeled data, C3MT achieves up to a 4.3% improvement in average Dice score and a reduction in HD95 of more than 1.0 mm compared with strong baselines. The implementation is publicly available at https://github.com/l1654485/C3MT.
{"title":"C3MT: Confidence-Calibrated Contrastive Mean Teacher for semi-supervised medical image segmentation.","authors":"Xianmin Wang, Mingfeng Lin, Jing Li","doi":"10.1016/j.compmedimag.2026.102721","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102721","url":null,"abstract":"<p><p>Semi-supervised learning is crucial for medical image segmentation due to the scarcity of labeled data. However, existing methods that combine consistency regularization and pseudo-labeling often suffer from inadequate feature representation, suboptimal subnetwork disagreement, and noisy pseudo-labels. To address these limitations, this paper proposed a novel Confidence-Calibrated Contrastive Mean Teacher (C3MT) framework. First, C3MT introduces a Contrastive Learning-based co-training strategy, where an adaptive disagreement adjustment mechanism dynamically regulates the divergence between student models. This not only preserves representation diversity but also stabilizes the training process. Second, C3MT introduces a Confidence-Calibrated and Category-Aligned uncertainty-guided region mixing strategy. The confidence-calibrated mechanism filters out unreliable pseudo-labels, whereas the category-aligned design restricts region swapping to patches of the same semantic category, preserving anatomical coherence and preventing semantic inconsistency in the mixed samples. Together, these components significantly enhance feature representation, training stability, and segmentation quality, especially in challenging low-annotation scenarios. Extensive experiments on ACDC, Synapse, and LA datasets show that C3MT consistently outperforms recent state-of-the-art methods. For example, on the ACDC dataset with 20% labeled data, C3MT achieves up to a 4.3% improvement in average Dice score and a reduction in HD95 of more than 1.0 mm compared with strong baselines. The implementation is publicly available at https://github.com/l1654485/C3MT.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102721"},"PeriodicalIF":4.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1016/j.compmedimag.2026.102712
Asim Zaman , Mazen M. Yassin , Rashid Khan , Faizan Ahmad , Guangtao Huang , Yongkang Shi , Ziran Chen , Yan Kang
Timely intervention in acute ischemic stroke (AIS) is critical, with approximately 1.9 million neurons lost per minute. Clinical time-of-flight magnetic resonance angiography (TOF-MRA) protocols, designed for rapid acquisition within a few minutes, introduce substantial domain shifts compared to high-resolution research datasets. These include reduced resolution, a lower signal-to-noise ratio, partial-volume effects, and motion artifacts, which are compounded by stroke-specific vascular abnormalities. Conventional segmentation models often fail under such conditions due to limited robustness to domain variability. We propose Vascular Flow-Attention Network (VFA-Net), a fully 3D deep neural network designed for zero-shot vessel segmentation in AIS, enabling knowledge transfer from annotated healthy TOF-MRA scans without retraining on pathological or low-quality clinical data. The architecture integrates five novel modules: (1) Flow-Pattern Attention for vascular continuity; (2) Multi-Scale Context Aggregation using dilated attention; (3) Vascular Flow Feature Refinement for adaptive attention enhancement; (4) Boundary-Guided Skip for precise boundary delineation; and (5) Flow-Refined Up-sampling to recover fine vessel details. The proposed model, trained exclusively on healthy TOF-MRA scans from four vendors (1.5 T and 3 T), was evaluated across three experimental configurations and significantly outperformed state-of-the-art models. By explicitly encoding vascular domain knowledge (continuity, boundaries, and flow topology) into its architecture, VFA-Net3D functions as a knowledge-guided system, enabling robust zero-shot generalization across clinical domains. VFA-Net3D presents a robust and clinically deployable solution for AIS vessel segmentation, supporting faster and more accurate diagnosis and treatment planning.
{"title":"VFA-Net3D: A zero-shot vascular flow-guided 3D network for brain vessel segmentation in acute ischemic stroke","authors":"Asim Zaman , Mazen M. Yassin , Rashid Khan , Faizan Ahmad , Guangtao Huang , Yongkang Shi , Ziran Chen , Yan Kang","doi":"10.1016/j.compmedimag.2026.102712","DOIUrl":"10.1016/j.compmedimag.2026.102712","url":null,"abstract":"<div><div>Timely intervention in acute ischemic stroke (AIS) is critical, with approximately 1.9 million neurons lost per minute. Clinical time-of-flight magnetic resonance angiography (TOF-MRA) protocols, designed for rapid acquisition within a few minutes, introduce substantial domain shifts compared to high-resolution research datasets. These include reduced resolution, a lower signal-to-noise ratio, partial-volume effects, and motion artifacts, which are compounded by stroke-specific vascular abnormalities. Conventional segmentation models often fail under such conditions due to limited robustness to domain variability. We propose Vascular Flow-Attention Network (VFA-Net), a fully 3D deep neural network designed for zero-shot vessel segmentation in AIS, enabling knowledge transfer from annotated healthy TOF-MRA scans without retraining on pathological or low-quality clinical data. The architecture integrates five novel modules: (1) Flow-Pattern Attention for vascular continuity; (2) Multi-Scale Context Aggregation using dilated attention; (3) Vascular Flow Feature Refinement for adaptive attention enhancement; (4) Boundary-Guided Skip for precise boundary delineation; and (5) Flow-Refined Up-sampling to recover fine vessel details. The proposed model, trained exclusively on healthy TOF-MRA scans from four vendors (1.5 T and 3 T), was evaluated across three experimental configurations and significantly outperformed state-of-the-art models. By explicitly encoding vascular domain knowledge (continuity, boundaries, and flow topology) into its architecture, VFA-Net3D functions as a knowledge-guided system, enabling robust zero-shot generalization across clinical domains. VFA-Net3D presents a robust and clinically deployable solution for AIS vessel segmentation, supporting faster and more accurate diagnosis and treatment planning.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"129 ","pages":"Article 102712"},"PeriodicalIF":4.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1016/j.compmedimag.2026.102722
Wenjie Kang, Bo Li, Lize C Jiskoot, Peter Paul De Deyn, Geert Jan Biessels, Huiberdina L Koek, Jurgen A H R Claassen, Huub A M Middelkoop, Wiesje M van der Flier, Willemijn J Jansen, Stefan Klein, Esther E Bron
Machine learning methods based on imaging and other clinical data have shown great potential for improving the early and accurate diagnosis of Alzheimer's disease (AD). However, for most deep learning models, especially those including high-dimensional imaging data, the decision-making process remains largely opaque which limits clinical applicability. Explainable Boosting Machines (EBMs) are inherently interpretable machine learning models, but are typically applied to low-dimensional data. In this study, we propose an interpretable machine learning framework that integrates data-driven feature extraction based on Convolutional Neural Networks (CNNs) with the intrinsic transparency of EBMs for AD diagnosis and prediction. The framework enables interpretation at both the group-level and individual-level by identifying imaging biomarkers contributing to predictions. We validated the framework on the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort, achieving an area-under-the-curve (AUC) of 0.969 for AD vs. control classification and 0.750 for MCI conversion prediction. External validation was performed on an independent cohort, yielding AUCs of 0.871 for AD vs. subjective cognitive decline (SCD) classification and 0.666 for MCI conversion prediction. The proposed framework achieves performance comparable to state-of-the-art black-box models while offering transparent decision-making, a critical requirement for clinical translation. Our code is available at: https://gitlab.com/radiology/neuro/interpretable_ad_classification.
{"title":"An interpretable machine learning framework with data-informed imaging biomarkers for diagnosis and prediction of Alzheimer's disease.","authors":"Wenjie Kang, Bo Li, Lize C Jiskoot, Peter Paul De Deyn, Geert Jan Biessels, Huiberdina L Koek, Jurgen A H R Claassen, Huub A M Middelkoop, Wiesje M van der Flier, Willemijn J Jansen, Stefan Klein, Esther E Bron","doi":"10.1016/j.compmedimag.2026.102722","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102722","url":null,"abstract":"<p><p>Machine learning methods based on imaging and other clinical data have shown great potential for improving the early and accurate diagnosis of Alzheimer's disease (AD). However, for most deep learning models, especially those including high-dimensional imaging data, the decision-making process remains largely opaque which limits clinical applicability. Explainable Boosting Machines (EBMs) are inherently interpretable machine learning models, but are typically applied to low-dimensional data. In this study, we propose an interpretable machine learning framework that integrates data-driven feature extraction based on Convolutional Neural Networks (CNNs) with the intrinsic transparency of EBMs for AD diagnosis and prediction. The framework enables interpretation at both the group-level and individual-level by identifying imaging biomarkers contributing to predictions. We validated the framework on the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort, achieving an area-under-the-curve (AUC) of 0.969 for AD vs. control classification and 0.750 for MCI conversion prediction. External validation was performed on an independent cohort, yielding AUCs of 0.871 for AD vs. subjective cognitive decline (SCD) classification and 0.666 for MCI conversion prediction. The proposed framework achieves performance comparable to state-of-the-art black-box models while offering transparent decision-making, a critical requirement for clinical translation. Our code is available at: https://gitlab.com/radiology/neuro/interpretable_ad_classification.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102722"},"PeriodicalIF":4.9,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A considerable increase in the incidence of heart failure (HF) has recently posed major challenges to the medical field, underscoring the urgent need for early detection and intervention. Medical vision-and-language pretraining learns general representation from medical images and texts and shows great prospects in multimodal data-based diagnosis. During model pretraining, patient images may contain multiple symptoms simultaneously and medical research faces a considerable challenge from noisy labels owing to factors such as differences among experts and machine-extracted labels. Furthermore, parameter-efficient fine-tuning (PEFT) is important for promoting model development. To address these challenges, we developed a multimodal vision-language pretrained model for HF, called HF-VLP. In particular, label calibration loss is adopted to solve the labeling noise problem of multisource pretraining data. During pretraining, the data labels are calibrated in real time using the correlation between the labels and prediction confidence. Considering the efficient migratability of the pretrained model, a PEFT method called decomposed singular value weight-decomposed low-rank adaptation is developed. It can learn the downstream data distribution quickly by fine-tuning <1% of the parameters to obtain a better diagnosis rate than zero-shot. Simultaneously, the developed model fuses chest X-ray image features and radiology report features through the dynamic fusion graph module, enhancing the interaction and expression ability of multimodal information. The validity of the model is verified on multiple medical datasets. The average AUC of multisymptom prediction in the Open-I dataset and the hospital dataset PPL-CXR reached 83.67% and 91.28%, respectively. The developed model can accurately classify the symptoms of patients, thereby assisting doctors in diagnosis.
{"title":"HF-VLP: A multimodal vision-language pre-trained model for diagnosing heart failure.","authors":"Huiting Ma, Dengao Li, Guiji Zhao, Li Liu, Jian Fu, Xiaole Fan, Zhe Zhang, Yuchen Liang","doi":"10.1016/j.compmedimag.2026.102719","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102719","url":null,"abstract":"<p><p>A considerable increase in the incidence of heart failure (HF) has recently posed major challenges to the medical field, underscoring the urgent need for early detection and intervention. Medical vision-and-language pretraining learns general representation from medical images and texts and shows great prospects in multimodal data-based diagnosis. During model pretraining, patient images may contain multiple symptoms simultaneously and medical research faces a considerable challenge from noisy labels owing to factors such as differences among experts and machine-extracted labels. Furthermore, parameter-efficient fine-tuning (PEFT) is important for promoting model development. To address these challenges, we developed a multimodal vision-language pretrained model for HF, called HF-VLP. In particular, label calibration loss is adopted to solve the labeling noise problem of multisource pretraining data. During pretraining, the data labels are calibrated in real time using the correlation between the labels and prediction confidence. Considering the efficient migratability of the pretrained model, a PEFT method called decomposed singular value weight-decomposed low-rank adaptation is developed. It can learn the downstream data distribution quickly by fine-tuning <1% of the parameters to obtain a better diagnosis rate than zero-shot. Simultaneously, the developed model fuses chest X-ray image features and radiology report features through the dynamic fusion graph module, enhancing the interaction and expression ability of multimodal information. The validity of the model is verified on multiple medical datasets. The average AUC of multisymptom prediction in the Open-I dataset and the hospital dataset PPL-CXR reached 83.67% and 91.28%, respectively. The developed model can accurately classify the symptoms of patients, thereby assisting doctors in diagnosis.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102719"},"PeriodicalIF":4.9,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1016/j.compmedimag.2026.102720
Xinhong Li, Geng Chen, Yuanfeng Wu, Haotian Jiang, Tao Zhou, Yi Zhou, Wentao Zhu
As a fundamental task, medical image segmentation plays a crucial role in various clinical applications. In recent years, deep learning-based segmentation methods have achieved significant success. However, these methods typically represent the image and objects within it as grid-structural data, while insufficient attention is given to relationships between the objects to segment. To address this issue, we propose a novel model called MedSegViG, which consists of a hierarchical encoder based on Vision GNN (ViG) and a hybrid feature decoder. During the segmentation process, our model first represents the image as a graph and then utilizes the encoder to extract multi-level graph features and image features. Finally, our hybrid feature decoder fuses these features to generate the final segmentation map. To validate the effectiveness of the proposed model, we conducted extensive experiments on six datasets across three types of lesions: polyps, skin lesions, and retinal vessels. The results demonstrate that MedSegViG achieves superior segmentation accuracy, robustness, and generalizability.
{"title":"Learning geometric and visual features for medical image segmentation with vision GNN.","authors":"Xinhong Li, Geng Chen, Yuanfeng Wu, Haotian Jiang, Tao Zhou, Yi Zhou, Wentao Zhu","doi":"10.1016/j.compmedimag.2026.102720","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102720","url":null,"abstract":"<p><p>As a fundamental task, medical image segmentation plays a crucial role in various clinical applications. In recent years, deep learning-based segmentation methods have achieved significant success. However, these methods typically represent the image and objects within it as grid-structural data, while insufficient attention is given to relationships between the objects to segment. To address this issue, we propose a novel model called MedSegViG, which consists of a hierarchical encoder based on Vision GNN (ViG) and a hybrid feature decoder. During the segmentation process, our model first represents the image as a graph and then utilizes the encoder to extract multi-level graph features and image features. Finally, our hybrid feature decoder fuses these features to generate the final segmentation map. To validate the effectiveness of the proposed model, we conducted extensive experiments on six datasets across three types of lesions: polyps, skin lesions, and retinal vessels. The results demonstrate that MedSegViG achieves superior segmentation accuracy, robustness, and generalizability.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102720"},"PeriodicalIF":4.9,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1016/j.compmedimag.2026.102709
Riadh Bouslimi
Fracture diagnosis from radiographic imaging remains challenging, particularly in clinical settings with limited access to expert radiologists or standardized reporting practices. This work introduces UG-GraphT5 (Uncertainty-Guided Graph Transformer for Radiology Report Generation), a unified multimodal framework for joint fracture classification and uncertainty-aware radiology report generation that explicitly treats diagnostic uncertainty as a central component guiding both reasoning and clinical communication. The proposed approach integrates visual representations, structured clinical knowledge derived from SNOMED CT, Bayesian uncertainty estimation, and guided natural language generation based on ClinicalT5, enabling adaptive multimodal fusion and calibrated language output. Evaluated on three radiological datasets comprising over 80,000 expert-annotated images and reports, UG-GraphT5 achieves improved fracture classification performance (F1-score of 82.6%), strong uncertainty calibration (ECE of 2.7%), and high-quality report generation (BLEU-4 of 0.356). Qualitative analysis and a reader study involving radiology trainees and experts further confirm that generated reports appropriately reflect diagnostic confidence through uncertainty-aware lexical modulation. An optimized clinical inference profile reduces inference latency by more than 40% without compromising diagnostic accuracy, highlighting the framework’s potential for interpretable, trustworthy, and deployment-aware AI-assisted radiology in resource-constrained clinical environments.
{"title":"A knowledge-guided and uncertainty-calibrated multimodal framework for fracture diagnosis and radiology report generation","authors":"Riadh Bouslimi","doi":"10.1016/j.compmedimag.2026.102709","DOIUrl":"10.1016/j.compmedimag.2026.102709","url":null,"abstract":"<div><div>Fracture diagnosis from radiographic imaging remains challenging, particularly in clinical settings with limited access to expert radiologists or standardized reporting practices. This work introduces <em>UG-GraphT5</em> (<em>Uncertainty-Guided Graph Transformer for Radiology Report Generation</em>), a unified multimodal framework for joint fracture classification and uncertainty-aware radiology report generation that explicitly treats diagnostic uncertainty as a central component guiding both reasoning and clinical communication. The proposed approach integrates visual representations, structured clinical knowledge derived from SNOMED CT, Bayesian uncertainty estimation, and guided natural language generation based on ClinicalT5, enabling adaptive multimodal fusion and calibrated language output. Evaluated on three radiological datasets comprising over 80,000 expert-annotated images and reports, UG-GraphT5 achieves improved fracture classification performance (F1-score of 82.6%), strong uncertainty calibration (ECE of 2.7%), and high-quality report generation (BLEU-4 of 0.356). Qualitative analysis and a reader study involving radiology trainees and experts further confirm that generated reports appropriately reflect diagnostic confidence through uncertainty-aware lexical modulation. An optimized clinical inference profile reduces inference latency by more than 40% without compromising diagnostic accuracy, highlighting the framework’s potential for interpretable, trustworthy, and deployment-aware AI-assisted radiology in resource-constrained clinical environments.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102709"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1016/j.compmedimag.2026.102714
Dominik Spinczyk , Grzegorz Rosiak , Jarosław Żyłkowski , Krzysztof Milczarek , Dariusz Konecki , Karol Zaczkowski , Agata Tomaszewska , Łukasz Przepióra , Anna Wolińska-Sołtys , Piotr Sperka , Dawid Hajda , Ewa Piętka
Minimally invasive ablation is a challenge for contemporary interventional radiology. This study aimed to investigate the feasibility of utilizing a mixed-reality system for this type of treatment. A HoloLens mixed reality and optical tracking system, which supports diagnosis, planning, and procedure implementation, was used for percutaneous liver tumor ablation. The system differentiated pathological liver changes at the diagnostic stage, allowing for the selection of the entry point and target during planning. Meanwhile, it provided a real-time fusion of intraoperative ultrasound images with a pre-operative hologram during the procedure. Additionally, the collision detection module enabled the detection of collisions between the ablative needle and anatomical structures, utilizing the actual needle trajectory. The system was evaluated in 11 patients with cancerous liver lesions. The mean accuracy of target point registration, selected at the planning stage, was 2.8 mm during the procedure's supporting stage. Additionally, operator depth perception improved, the effective needle trajectory was shortened, and the radiation dose was reduced for both the patient and the operator due to improved visibility of the needle within the patient’s body. A generally improved understanding of the mutual spatial relationship between anatomical structures was observed compared to the classical two-dimensional view, along with improved depth perception of the operating field. An additional advantage indicated by the operators was the real-time highlighting of anatomical structures susceptible to damage by the needle trajectory, such as blood vessels, bile ducts, and the lungs, which lowers the risk of complications.
{"title":"Initial evaluation of a mixed-reality system for image-guided navigation during percutaneous liver tumor ablation","authors":"Dominik Spinczyk , Grzegorz Rosiak , Jarosław Żyłkowski , Krzysztof Milczarek , Dariusz Konecki , Karol Zaczkowski , Agata Tomaszewska , Łukasz Przepióra , Anna Wolińska-Sołtys , Piotr Sperka , Dawid Hajda , Ewa Piętka","doi":"10.1016/j.compmedimag.2026.102714","DOIUrl":"10.1016/j.compmedimag.2026.102714","url":null,"abstract":"<div><div>Minimally invasive ablation is a challenge for contemporary interventional radiology. This study aimed to investigate the feasibility of utilizing a mixed-reality system for this type of treatment. A HoloLens mixed reality and optical tracking system, which supports diagnosis, planning, and procedure implementation, was used for percutaneous liver tumor ablation. The system differentiated pathological liver changes at the diagnostic stage, allowing for the selection of the entry point and target during planning. Meanwhile, it provided a real-time fusion of intraoperative ultrasound images with a pre-operative hologram during the procedure. Additionally, the collision detection module enabled the detection of collisions between the ablative needle and anatomical structures, utilizing the actual needle trajectory. The system was evaluated in 11 patients with cancerous liver lesions. The mean accuracy of target point registration, selected at the planning stage, was 2.8 mm during the procedure's supporting stage. Additionally, operator depth perception improved, the effective needle trajectory was shortened, and the radiation dose was reduced for both the patient and the operator due to improved visibility of the needle within the patient’s body. A generally improved understanding of the mutual spatial relationship between anatomical structures was observed compared to the classical two-dimensional view, along with improved depth perception of the operating field. An additional advantage indicated by the operators was the real-time highlighting of anatomical structures susceptible to damage by the needle trajectory, such as blood vessels, bile ducts, and the lungs, which lowers the risk of complications.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102714"},"PeriodicalIF":4.9,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1016/j.compmedimag.2026.102718
Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda
The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM by first benchmarking it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we demonstrate how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at https://cosbidev.github.io/XGeM/.
{"title":"XGeM: A multi-prompt foundation model for multimodal medical data generation.","authors":"Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda","doi":"10.1016/j.compmedimag.2026.102718","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102718","url":null,"abstract":"<p><p>The adoption of Artificial Intelligence in medical imaging holds great promise, yet it remains hindered by challenges such as data scarcity, privacy concerns, and the need for robust multimodal integration. While recent advances in generative modeling have enabled high-quality synthetic data generation, existing approaches are often limited to unimodal, unidirectional synthesis and therefore lack the ability to jointly synthesize multiple modalities while preserving clinical consistency. To address this challenge, we introduce XGeM, a 6.77-billion-parameter multimodal generative model designed to support flexible, any-to-any synthesis between medical data modalities. XGeM constructs a shared latent space via contrastive learning and introduces a novel Multi-Prompt Training strategy, enabling conditioning on arbitrary subsets of input modalities. This design allows the model to adapt to heterogeneous clinical inputs and generate multiple outputs jointly, preserving both semantic and structural coherence. We extensively validate XGeM by first benchmarking it against five competitors on the MIMIC-CXR dataset, a state-of-the-art dataset for multi-view Chest X-ray and radiological report generation. Secondly, we perform a Visual Turing Test with expert radiologists to assess the realism and clinical relevance of the generated data, ensuring alignment with real-world scenarios. Finally, we demonstrate how XGeM can support key medical data challenges such as anonymization, class imbalance, and data scarcity, underscoring its utility as a foundation model for medical data synthesis. Project page is at https://cosbidev.github.io/XGeM/.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102718"},"PeriodicalIF":4.9,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The pretreatment identification of post-radiation nasopharyngeal necrosis (PRNN) combined with recurrent nasopharyngeal carcinoma (referred to as cancer-infiltrative PRNN) is crucial for the diagnosis and treatment of PRNN. As the first study to identify recurrent nasopharyngeal carcinoma in patients with PRNN, we aimed to develop a deep learning (DL)-based predictive model using routine MRI to distinguish cancer-infiltrative PRNN from cancer-free PRNN.
Methods: MRIs of 437 patients with PRNN were manually labeled and randomly divided into training and validation cohorts. Video Swin Transformer and Multilayer Perceptron were employed to construct the DL model. The integrated DL and clinical model (DCcombined model) and the integrated radiomics and clinical model (RCcombined model) were constructed using linear weighted fusion of the prediction results from the two models. The predictive value of each model was evaluated using the area under the curve (AUC), accuracy, sensitivity, and specificity.
Results: The DCcombined model significantly outperformed the radiologists in terms of AUC (0.83 vs. 0.60, p < 0.001), accuracy (0.78 vs. 0.60, p = 0.002), and sensitivity (0.86 vs. 0.62, p = 0.002) in the validation cohort. The DCcombined model showed the highest validation sensitivity of 0.86 (95 % CI 0.77-0.94), whereas the RCcombined model demonstrated the highest specificity of 0.88 (95 % CI 0.81-0.96).
Conclusions: Our DCcombined model based on DL can noninvasively distinguish cancer-infiltrative PRNN from cancer-free PRNN with higher AUC, accuracy, and sensitivity than those of radiologists and better sensitivity than that of the RCcombined model based on radiomics.
背景:放疗后鼻咽坏死(PRNN)合并复发性鼻咽癌(简称癌性浸润性PRNN)的预处理鉴别对PRNN的诊断和治疗至关重要。作为首个在PRNN患者中识别复发性鼻咽癌的研究,我们旨在开发一种基于深度学习(DL)的预测模型,利用常规MRI来区分癌浸润性PRNN和无癌PRNN。方法:对437例PRNN患者的mri进行人工标记,随机分为训练组和验证组。采用Video Swin Transformer和多层感知器构建深度学习模型。将两种模型的预测结果进行线性加权融合,构建DL与临床集成模型(DCcombined model)和放射组学与临床集成模型(RCcombined model)。使用曲线下面积(AUC)、准确性、敏感性和特异性评估每种模型的预测价值。结果:dc联合模型在AUC方面明显优于放射科医生(0.83 vs 0.60, p 联合模型的验证灵敏度最高,为0.86(95 % CI 0.77-0.94),而rc联合模型的特异性最高,为0.88(95 % CI 0.81-0.96)。结论:基于DL的DCcombined模型能够无创区分癌性浸润性PRNN和无癌性PRNN, AUC、准确度和敏感性均高于放射科医师,敏感性优于基于放射组学的RCcombined模型。
{"title":"MRI-based deep learning model predicts recurrent nasopharyngeal carcinoma in post-radiation nasopharyngeal necrosis.","authors":"Chao Lin, Jiong-Lin Liang, Jia Guo, Yu-Long Xie, Weidong Cheng, Guo-Heng Huang, Qing-Wen Lin, Jian-Wu Chen, Tong Xiang, Hai-Qiang Mai, Qi Yang","doi":"10.1016/j.compmedimag.2026.102711","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102711","url":null,"abstract":"<p><strong>Background: </strong>The pretreatment identification of post-radiation nasopharyngeal necrosis (PRNN) combined with recurrent nasopharyngeal carcinoma (referred to as cancer-infiltrative PRNN) is crucial for the diagnosis and treatment of PRNN. As the first study to identify recurrent nasopharyngeal carcinoma in patients with PRNN, we aimed to develop a deep learning (DL)-based predictive model using routine MRI to distinguish cancer-infiltrative PRNN from cancer-free PRNN.</p><p><strong>Methods: </strong>MRIs of 437 patients with PRNN were manually labeled and randomly divided into training and validation cohorts. Video Swin Transformer and Multilayer Perceptron were employed to construct the DL model. The integrated DL and clinical model (DC<sub>combined</sub> model) and the integrated radiomics and clinical model (RC<sub>combined</sub> model) were constructed using linear weighted fusion of the prediction results from the two models. The predictive value of each model was evaluated using the area under the curve (AUC), accuracy, sensitivity, and specificity.</p><p><strong>Results: </strong>The DC<sub>combined</sub> model significantly outperformed the radiologists in terms of AUC (0.83 vs. 0.60, p < 0.001), accuracy (0.78 vs. 0.60, p = 0.002), and sensitivity (0.86 vs. 0.62, p = 0.002) in the validation cohort. The DC<sub>combined</sub> model showed the highest validation sensitivity of 0.86 (95 % CI 0.77-0.94), whereas the RC<sub>combined</sub> model demonstrated the highest specificity of 0.88 (95 % CI 0.81-0.96).</p><p><strong>Conclusions: </strong>Our DC<sub>combined</sub> model based on DL can noninvasively distinguish cancer-infiltrative PRNN from cancer-free PRNN with higher AUC, accuracy, and sensitivity than those of radiologists and better sensitivity than that of the RC<sub>combined</sub> model based on radiomics.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102711"},"PeriodicalIF":4.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.1016/j.compmedimag.2026.102717
Chenhui Qiu, Sarah Miller, Barathi Subramanian, Angela Ryu, Haiyu Zhang, George A Fisher, Nigam H Shah, John Mongan, Curtis Langlotz, Peter Poullos, Jeanne Shen
Colorectal cancer (CRC) is the third most commonly diagnosed malignancy worldwide and a leading cause of cancer-related mortality. This study aims to investigate an automatic detection pipeline for identification and localization of the primary CRC in portal venous phase contrast-enhanced CT scans, which is a crucial first step for downstream CRC staging, prognostication, and treatment planning. We propose a deep learning-based automated detection pipeline using YOLOv11 as the baseline architecture. A ResNet50 module was incorporated into the YOLOv11 backbone to enhance image feature extraction. Additionally, a scale-adaptive loss function, which introduces an adaptive coefficient and a scaling factor to adaptively measure the Intersection over Union (IoU) and center point distance for improving box regression performance, was designed to further improve detection performance. The proposed pipeline achieved a recall of 0.8092, precision of 0.8187, and F-1 score of 0.8139 for CRC detection on our in-house dataset at the patient level (inter-patient evaluation) and a recall of 0.9949, precision of 0.9894, and F-1 score of 0.9921 at the slice level (intra-patient evaluation). Validation on an external public dataset demonstrated that our pipeline, when trained on a patient-level in-house dataset, obtained a recall of 0.8283, precision of 0.8414, and F-1 score of 0.8348 and, when trained on a slice-level in-house dataset, achieved a recall of 0.6897, precision of 0.7888, and F-1 score of 0.7358, outperforming existing representative detection methods. The superior CRC detection performance on the in-house CT dataset and state-of-the-art generalization performance on the public dataset (with a 31.97 %age point improvement in detection sensitivity (recall) over the next closest state-of-the-art method), highlight the potential translational value of our pipeline for CRC clinical decision support, conditional upon validation in larger cohorts.
结直肠癌(CRC)是全球第三大最常诊断的恶性肿瘤,也是癌症相关死亡的主要原因。本研究旨在探索门静脉期增强CT扫描中原发性结直肠癌的自动识别和定位方法,这是下游结直肠癌分期、预后和治疗计划的关键第一步。我们提出了一种基于深度学习的自动检测管道,使用YOLOv11作为基准架构。在YOLOv11主干中加入ResNet50模块,增强图像特征提取。此外,设计了一种尺度自适应损失函数,引入自适应系数和比例因子自适应测量交联(Intersection over Union, IoU)和中心点距离,以提高盒回归性能,进一步提高检测性能。所提出的管道在患者水平(患者间评估)的内部数据集中实现了CRC检测的召回率为0.8092,精度为0.8187,F-1评分为0.8139,在切片水平(患者内评估)的召回率为0.9949,精度为0.9894,F-1评分为0.9921。在外部公共数据集上的验证表明,当在患者级内部数据集上训练时,我们的管道获得了0.8283的召回率,0.8414的精度和0.8348的F-1分数,当在切片级内部数据集上训练时,实现了0.6897的召回率,0.7888的精度和0.7358的F-1分数,优于现有的代表性检测方法。在内部CT数据集上优越的CRC检测性能和在公共数据集上最先进的泛化性能(检测灵敏度(召回率)比下一个最接近的最先进方法提高31.97%),突出了我们的管道在CRC临床决策支持方面的潜在转化价值,条件是在更大的队列中进行验证。
{"title":"A deep learning-based automated pipeline for colorectal cancer detection in contrast-enhanced CT images.","authors":"Chenhui Qiu, Sarah Miller, Barathi Subramanian, Angela Ryu, Haiyu Zhang, George A Fisher, Nigam H Shah, John Mongan, Curtis Langlotz, Peter Poullos, Jeanne Shen","doi":"10.1016/j.compmedimag.2026.102717","DOIUrl":"https://doi.org/10.1016/j.compmedimag.2026.102717","url":null,"abstract":"<p><p>Colorectal cancer (CRC) is the third most commonly diagnosed malignancy worldwide and a leading cause of cancer-related mortality. This study aims to investigate an automatic detection pipeline for identification and localization of the primary CRC in portal venous phase contrast-enhanced CT scans, which is a crucial first step for downstream CRC staging, prognostication, and treatment planning. We propose a deep learning-based automated detection pipeline using YOLOv11 as the baseline architecture. A ResNet50 module was incorporated into the YOLOv11 backbone to enhance image feature extraction. Additionally, a scale-adaptive loss function, which introduces an adaptive coefficient and a scaling factor to adaptively measure the Intersection over Union (IoU) and center point distance for improving box regression performance, was designed to further improve detection performance. The proposed pipeline achieved a recall of 0.8092, precision of 0.8187, and F-1 score of 0.8139 for CRC detection on our in-house dataset at the patient level (inter-patient evaluation) and a recall of 0.9949, precision of 0.9894, and F-1 score of 0.9921 at the slice level (intra-patient evaluation). Validation on an external public dataset demonstrated that our pipeline, when trained on a patient-level in-house dataset, obtained a recall of 0.8283, precision of 0.8414, and F-1 score of 0.8348 and, when trained on a slice-level in-house dataset, achieved a recall of 0.6897, precision of 0.7888, and F-1 score of 0.7358, outperforming existing representative detection methods. The superior CRC detection performance on the in-house CT dataset and state-of-the-art generalization performance on the public dataset (with a 31.97 %age point improvement in detection sensitivity (recall) over the next closest state-of-the-art method), highlight the potential translational value of our pipeline for CRC clinical decision support, conditional upon validation in larger cohorts.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"102717"},"PeriodicalIF":4.9,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}