Pub Date : 2026-01-13DOI: 10.1016/j.compmedimag.2026.102702
Wietske A.P. Bastiaansen , Melek Rousian , Anton H.J. Koning , Wiro J. Niessen , Bernadette S. de Bakker , Régine P.M. Steegers-Theunissen , Stefan Klein
Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to an anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anatomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.
{"title":"The 4D Human Embryonic Brain Atlas: Spatiotemporal atlas generation for rapid anatomical changes","authors":"Wietske A.P. Bastiaansen , Melek Rousian , Anton H.J. Koning , Wiro J. Niessen , Bernadette S. de Bakker , Régine P.M. Steegers-Theunissen , Stefan Klein","doi":"10.1016/j.compmedimag.2026.102702","DOIUrl":"10.1016/j.compmedimag.2026.102702","url":null,"abstract":"<div><div>Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to an anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anatomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102702"},"PeriodicalIF":4.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.compmedimag.2025.102694
Wenhua Li, Lifang Wang, Min Zhao, Xingzhang Lü, Linwen Yi
Medical image–text alignment remains challenging due to subtle lesion patterns, heterogeneous vision–language semantics, and the lack of lesion-aware guidance during visual encoding. Existing methods typically introduce textual information only after visual features have been computed, leaving early and mid-level representations insufficiently conditioned on diagnostic semantics. This limits the model’s ability to capture fine-grained abnormalities and maintain stable alignment across heterogeneous chest X-ray datasets. To address these limitations, we propose TGIAlign, a text-guided dual-branch bidirectional alignment framework that applies structured, lesion-centric cues to intermediate visual representations obtained from the frozen encoder. A large language model (LLM) is used to extract normalized, attribute-based lesion descriptions, providing consistent semantic guidance across samples. These cues are incorporated through the Text-Guided Image Feature Weighting (TGIF) module, which reweights intermediate feature outputs using similarity-derived weights, enabling multi-scale semantic conditioning without modifying the frozen backbone. To capture complementary visual cues, TGIAlign integrates multi-scale text-guided features with high-level visual representations through a Dual-Branch Bidirectional Alignment (DBBA) mechanism. Experiments on six public chest X-ray datasets demonstrate that TGIAlign achieves stable top-K retrieval and reliable text-guided lesion localization, highlighting the effectiveness of early semantic conditioning combined with dual-branch alignment for improving medical vision–language correspondence within chest X-ray settings.
{"title":"TGIAlign: Text-guided dual-branch bidirectional framework for cross-modal semantic alignment in medical vision-language","authors":"Wenhua Li, Lifang Wang, Min Zhao, Xingzhang Lü, Linwen Yi","doi":"10.1016/j.compmedimag.2025.102694","DOIUrl":"10.1016/j.compmedimag.2025.102694","url":null,"abstract":"<div><div>Medical image–text alignment remains challenging due to subtle lesion patterns, heterogeneous vision–language semantics, and the lack of lesion-aware guidance during visual encoding. Existing methods typically introduce textual information only after visual features have been computed, leaving early and mid-level representations insufficiently conditioned on diagnostic semantics. This limits the model’s ability to capture fine-grained abnormalities and maintain stable alignment across heterogeneous chest X-ray datasets. To address these limitations, we propose TGIAlign, a text-guided dual-branch bidirectional alignment framework that applies structured, lesion-centric cues to intermediate visual representations obtained from the frozen encoder. A large language model (LLM) is used to extract normalized, attribute-based lesion descriptions, providing consistent semantic guidance across samples. These cues are incorporated through the Text-Guided Image Feature Weighting (TGIF) module, which reweights intermediate feature outputs using similarity-derived weights, enabling multi-scale semantic conditioning without modifying the frozen backbone. To capture complementary visual cues, TGIAlign integrates multi-scale text-guided features with high-level visual representations through a Dual-Branch Bidirectional Alignment (DBBA) mechanism. Experiments on six public chest X-ray datasets demonstrate that TGIAlign achieves stable top-K retrieval and reliable text-guided lesion localization, highlighting the effectiveness of early semantic conditioning combined with dual-branch alignment for improving medical vision–language correspondence within chest X-ray settings.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102694"},"PeriodicalIF":4.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1016/j.compmedimag.2026.102698
Siyuan Tang , Siriguleng Wang , Gang Xiang , Jinliang Zhao , Yuxin Wang
Accurate segmentation of lung lesions is critical for clinical diagnosis. Traditional methods rely solely on unimodal visual data, which limits the performance of existing medical image segmentation models. This paper introduces a novel approach, Text-Augmented Medical Segment Anything Module(TA-MedSAM), which enhances cross-modal representation capabilities through a vision-language fusion paradigm. This method significantly improves segmentation accuracy for pulmonary lesions with challenging characteristics including low contrast, blurred boundaries, complex morphology, and small size. Firstly, we introduce a lightweight Medical Segment Anything Model (MedSAM) image encoder and a pre-trained ClinicalBERT text encoder to extract visual and textual features, This design preserves segmentation performance while reducing model parameters and computational costs, thereby enhancing inference speed. Secondly, a Reconstruction Text Module is proposed to focus the model on lesion-centric textual cues, strengthening semantic guidance for segmentation. Thirdly, we develop an effective Multimodal Feature Fusion Module that integrates visual and textual features using attention mechanisms, and introduce a feature alignment coordination mechanism to mutually enhance heterogeneous information across modalities, and a Dynamic Perception Learning Mechanism is proposed to quantitatively evaluate fusion effectiveness, enabling optimal fused feature selection for improved segmentation accuracy. Finally, a Multi-scale Feature Fusion Module combined with a Multi-task Loss Function enhances segmentation performance for complex regions. Comparative experiments demonstrate that TA-MedSAM outperforms state-of-the-art unimodal and multimodal methods on QaTa-COV19, MosMedData+ , and private dataset. Extensive ablation studies validate the efficacy of our proposed components and optimal hyperparameter combinations.
{"title":"TA-MedSAM: Text-augmented improved MedSAM for pulmonary lesion segmentation","authors":"Siyuan Tang , Siriguleng Wang , Gang Xiang , Jinliang Zhao , Yuxin Wang","doi":"10.1016/j.compmedimag.2026.102698","DOIUrl":"10.1016/j.compmedimag.2026.102698","url":null,"abstract":"<div><div>Accurate segmentation of lung lesions is critical for clinical diagnosis. Traditional methods rely solely on unimodal visual data, which limits the performance of existing medical image segmentation models. This paper introduces a novel approach, Text-Augmented Medical Segment Anything Module(TA-MedSAM), which enhances cross-modal representation capabilities through a vision-language fusion paradigm. This method significantly improves segmentation accuracy for pulmonary lesions with challenging characteristics including low contrast, blurred boundaries, complex morphology, and small size. Firstly, we introduce a lightweight Medical Segment Anything Model (MedSAM) image encoder and a pre-trained ClinicalBERT text encoder to extract visual and textual features, This design preserves segmentation performance while reducing model parameters and computational costs, thereby enhancing inference speed. Secondly, a Reconstruction Text Module is proposed to focus the model on lesion-centric textual cues, strengthening semantic guidance for segmentation. Thirdly, we develop an effective Multimodal Feature Fusion Module that integrates visual and textual features using attention mechanisms, and introduce a feature alignment coordination mechanism to mutually enhance heterogeneous information across modalities, and a Dynamic Perception Learning Mechanism is proposed to quantitatively evaluate fusion effectiveness, enabling optimal fused feature selection for improved segmentation accuracy. Finally, a Multi-scale Feature Fusion Module combined with a Multi-task Loss Function enhances segmentation performance for complex regions. Comparative experiments demonstrate that TA-MedSAM outperforms state-of-the-art unimodal and multimodal methods on QaTa-COV19, MosMedData+ , and private dataset. Extensive ablation studies validate the efficacy of our proposed components and optimal hyperparameter combinations.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102698"},"PeriodicalIF":4.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.compmedimag.2026.102706
Bing Yang , Jun Li , Junyang Chen , Yutong Huang , Nanbo Xu , Qiurui Liu , Jiaxin Liu , Yuheng Zhou
In medical image analysis, accurately diagnosing complex lesions remains a formidable challenge, especially for thyroid disorders, which exhibit high incidence and intricate pathology. To enhance diagnostic precision and robustness, we assembled ThyM3, a large-scale multimodal dataset comprising thyroid computed tomography and ultrasound images. Building on this resource, we introduce ThyFusionNet, a novel deep-learning architecture that combines convolutional backbones with transformer modules and performs feature-level fusion to exploit complementary cues across modalities. To improve semantic alignment and spatial modeling, we incorporate head-wise positional encodings and an adaptive sparse attention scheme that suppresses redundant activations while highlighting key features. Skip connections are used to retain low-level details, and a gated-attention fusion block further enriches cross-modal interaction. We also propose an adaptive contrastive-entropy loss that preserves feature consistency and simultaneously enhances prediction discriminability and stability. Extensive experiments demonstrate that ThyFusionNet surpasses current leading methods in accuracy, robustness, and generalization, underscoring its strong potential for clinical deployment.
{"title":"ThyFusionNet: A CNN–transformer framework with spatial aware sparse attention for multi modal thyroid disease diagnosis","authors":"Bing Yang , Jun Li , Junyang Chen , Yutong Huang , Nanbo Xu , Qiurui Liu , Jiaxin Liu , Yuheng Zhou","doi":"10.1016/j.compmedimag.2026.102706","DOIUrl":"10.1016/j.compmedimag.2026.102706","url":null,"abstract":"<div><div>In medical image analysis, accurately diagnosing complex lesions remains a formidable challenge, especially for thyroid disorders, which exhibit high incidence and intricate pathology. To enhance diagnostic precision and robustness, we assembled ThyM3, a large-scale multimodal dataset comprising thyroid computed tomography and ultrasound images. Building on this resource, we introduce ThyFusionNet, a novel deep-learning architecture that combines convolutional backbones with transformer modules and performs feature-level fusion to exploit complementary cues across modalities. To improve semantic alignment and spatial modeling, we incorporate head-wise positional encodings and an adaptive sparse attention scheme that suppresses redundant activations while highlighting key features. Skip connections are used to retain low-level details, and a gated-attention fusion block further enriches cross-modal interaction. We also propose an adaptive contrastive-entropy loss that preserves feature consistency and simultaneously enhances prediction discriminability and stability. Extensive experiments demonstrate that ThyFusionNet surpasses current leading methods in accuracy, robustness, and generalization, underscoring its strong potential for clinical deployment.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102706"},"PeriodicalIF":4.9,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.compmedimag.2026.102704
Shyam Sundar Debsarkar , V.B. Surya Prasath
Recent advances in deep learning have significantly improved the accuracy of computational pathology; however conventional model ensembling strategies often lack adaptability and interpretability hindering the clinical adaptability. While multiple artificial intelligence (AI) expert models can provide complementary perspectives, simply aggregating their outputs is often insufficient for handling inter-model disagreement and delivering interpretable decisions. To address these challenges, we propose a novel multi-expert framework that integrates diverse vision-based predictors and a clinical feature-based model, with a large language model (LLM) acting as an intelligent arbitrator. By leveraging the contextual reasoning and explanation capabilities of LLMs, our architecture dynamically synthesizes insights from both imaging and clinical data, resolving model conflicts, and providing transparent, rational decisions. We validate our approach on two cancer histopathology datasets, namely the HMU-GC-HE-30K which is a gastric cancer dataset containing pathology images only, and the BCNB which is a breast cancer biopsy dataset that is multimodal — contains pathology imaging and clinical information. Our proposed multi-expert, LLM arbitrated framework (MELLMA) outperforms convolutional neural networks (CNNs), and transformers, which are currently the de facto and state-of-the-art classification ensemble models, with better overall results. We test different LLMs as arbitrators, namely LLaMA, GPT variants, and Mistral. Further, our proposed framework outperforms strong single-agent CNN/ViT baselines on the datasets, and ablations show that learned per-agent trust materially improves the arbitrator’s decisions without altering prompts or data. These experimental results demonstrate that LLM-guided arbitration consistently provides more robust and explainable performance than individual models, conventional ensembling with majority vote, uniform average, and meta-learners. The results obtained highlight the promise of LLM-driven arbitration for building transparent and extensible AI systems in digital pathology.
{"title":"A multi-expert deep learning framework with LLM-guided arbitration for multimodal histopathology prediction","authors":"Shyam Sundar Debsarkar , V.B. Surya Prasath","doi":"10.1016/j.compmedimag.2026.102704","DOIUrl":"10.1016/j.compmedimag.2026.102704","url":null,"abstract":"<div><div>Recent advances in deep learning have significantly improved the accuracy of computational pathology; however conventional model ensembling strategies often lack adaptability and interpretability hindering the clinical adaptability. While multiple artificial intelligence (AI) expert models can provide complementary perspectives, simply aggregating their outputs is often insufficient for handling inter-model disagreement and delivering interpretable decisions. To address these challenges, we propose a novel multi-expert framework that integrates diverse vision-based predictors and a clinical feature-based model, with a large language model (LLM) acting as an intelligent arbitrator. By leveraging the contextual reasoning and explanation capabilities of LLMs, our architecture dynamically synthesizes insights from both imaging and clinical data, resolving model conflicts, and providing transparent, rational decisions. We validate our approach on two cancer histopathology datasets, namely the HMU-GC-HE-30K which is a gastric cancer dataset containing pathology images only, and the BCNB which is a breast cancer biopsy dataset that is multimodal — contains pathology imaging and clinical information. Our proposed multi-expert, LLM arbitrated framework (MELLMA) outperforms convolutional neural networks (CNNs), and transformers, which are currently the de facto and state-of-the-art classification ensemble models, with better overall results. We test different LLMs as arbitrators, namely LLaMA, GPT variants, and Mistral. Further, our proposed framework outperforms strong single-agent CNN/ViT baselines on the datasets, and ablations show that learned per-agent trust materially improves the arbitrator’s decisions without altering prompts or data. These experimental results demonstrate that LLM-guided arbitration consistently provides more robust and explainable performance than individual models, conventional ensembling with majority vote, uniform average, and meta-learners. The results obtained highlight the promise of LLM-driven arbitration for building transparent and extensible AI systems in digital pathology.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102704"},"PeriodicalIF":4.9,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.compmedimag.2026.102701
Muhammad Hammad Malik , Zishuo Wan , Yingying Ren , Da-Wei Ding
The accurate classification of eye diseases such as cataract, diabetic retinopathy (DR), glaucoma, and healthy conditions from fundus images remains a critical challenge in ophthalmology, requiring early diagnosis and treatment to prevent vision loss. Existing deep learning methods rely on large labeled datasets, inefficient use of unlabeled data, and limited interpretability, restricting clinical applicability. To address these limitations, we propose a novel CNN-Transformer hybrid architecture coupled with innovative semi-supervised learning (SSL) and explainability techniques to enhance multiclass eye disease classification. Our methodology integrates a ConvNeXt backbone with Transformer modules, leveraging multi-head attention to effectively capture both spatial features and long-range dependencies. We introduce Uncertainty-Guided MixMatch (UG-MixMatch), a semi-supervised framework that leverages Monte Carlo (MC) dropout for uncertainty quantification and pseudo-label refinement, effectively utilizing both labeled and unlabeled data. For interpretability, we propose a novel Gradient-based Integrated Attention Map (GIAM), which aggregates attention maps across multiple layers. It incorporates adaptive channel-wise weighting, offering more detailed insights into model predictions, surpassing traditional Grad-CAM methods. Evaluated on the Ocular Imaging Health (OIH) dataset of 4215 fundus images across four classes, our approach achieved a 95.27 % classification accuracy using UG-MixMatch and 95.51 % when incorporating MC dropout for direct model evaluation. Cohen’s kappa score reached 93.70, indicating near-perfect agreement with the ground truth. Class-wise performance was exceptional, with 100 % sensitivity and specificity for DR and over 95 % specificity for cataract and glaucoma. Robust AUC values were observed, including 1.00 for DR and cataract and 0.99 for glaucoma and healthy cases. GIAM visualizations effectively highlighted disease-relevant regions, offering enhanced clinical interpretability and validation potential. Our framework addresses data scarcity, enhances interpretability, and delivers clinically relevant performance, a promising step towards scalable, explainable, and accurate diagnostic tools for Clinical Decision Support Systems (CDSS) and ophthalmic screening.
{"title":"A hybrid Transformer-CNN framework for uncertainty-guided semi-supervised multiclass eye disease classification with enhanced interpretability","authors":"Muhammad Hammad Malik , Zishuo Wan , Yingying Ren , Da-Wei Ding","doi":"10.1016/j.compmedimag.2026.102701","DOIUrl":"10.1016/j.compmedimag.2026.102701","url":null,"abstract":"<div><div>The accurate classification of eye diseases such as cataract, diabetic retinopathy (DR), glaucoma, and healthy conditions from fundus images remains a critical challenge in ophthalmology, requiring early diagnosis and treatment to prevent vision loss. Existing deep learning methods rely on large labeled datasets, inefficient use of unlabeled data, and limited interpretability, restricting clinical applicability. To address these limitations, we propose a novel CNN-Transformer hybrid architecture coupled with innovative semi-supervised learning (SSL) and explainability techniques to enhance multiclass eye disease classification. Our methodology integrates a ConvNeXt backbone with Transformer modules, leveraging multi-head attention to effectively capture both spatial features and long-range dependencies. We introduce Uncertainty-Guided MixMatch (UG-MixMatch), a semi-supervised framework that leverages Monte Carlo (MC) dropout for uncertainty quantification and pseudo-label refinement, effectively utilizing both labeled and unlabeled data. For interpretability, we propose a novel Gradient-based Integrated Attention Map (GIAM), which aggregates attention maps across multiple layers. It incorporates adaptive channel-wise weighting, offering more detailed insights into model predictions, surpassing traditional Grad-CAM methods. Evaluated on the Ocular Imaging Health (OIH) dataset of 4215 fundus images across four classes, our approach achieved a 95.27 % classification accuracy using UG-MixMatch and 95.51 % when incorporating MC dropout for direct model evaluation. Cohen’s kappa score reached 93.70, indicating near-perfect agreement with the ground truth. Class-wise performance was exceptional, with 100 % sensitivity and specificity for DR and over 95 % specificity for cataract and glaucoma. Robust AUC values were observed, including 1.00 for DR and cataract and 0.99 for glaucoma and healthy cases. GIAM visualizations effectively highlighted disease-relevant regions, offering enhanced clinical interpretability and validation potential. Our framework addresses data scarcity, enhances interpretability, and delivers clinically relevant performance, a promising step towards scalable, explainable, and accurate diagnostic tools for Clinical Decision Support Systems (CDSS) and ophthalmic screening.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102701"},"PeriodicalIF":4.9,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.compmedimag.2026.102705
Guido Manni , Clemente Lauretti , Loredana Zollo , Paolo Soda
Deep learning has revolutionized medical imaging, but its effectiveness is severely limited by insufficient labeled training data. This paper introduces a novel GAN-based semi-supervised learning framework specifically designed for low labeled-data regimes, evaluated across settings with 5 to 50 labeled samples per class. Our approach integrates three specialized neural networks: a generator for class-conditioned image translation, a discriminator for authenticity assessment and classification, and a dedicated classifier, within a three-phase training framework. The method alternates between supervised training on limited labeled data and unsupervised learning that leverages abundant unlabeled images through image-to-image translation rather than generation from noise. We employ ensemble-based pseudo-labeling that combines confidence-weighted predictions from the discriminator and classifier with temporal consistency through exponential moving averaging, enabling reliable label estimation for unlabeled data. Comprehensive evaluation across eleven MedMNIST datasets demonstrates that our approach achieves statistically significant improvements over six state-of-the-art GAN-based semi-supervised methods, with particularly strong performance in the extreme 5-shot setting where the scarcity of labeled data is most challenging. The framework maintains its superiority across all evaluated settings (5, 10, 20, and 50 shots per class). Our approach offers a practical solution for medical imaging applications where annotation costs are prohibitive, enabling robust classification performance even with minimal labeled data. Code is available at https://github.com/GuidoManni/SPARSE.
{"title":"SPARSE data, rich results: Few-shot semi-supervised learning via class-conditioned image translation","authors":"Guido Manni , Clemente Lauretti , Loredana Zollo , Paolo Soda","doi":"10.1016/j.compmedimag.2026.102705","DOIUrl":"10.1016/j.compmedimag.2026.102705","url":null,"abstract":"<div><div>Deep learning has revolutionized medical imaging, but its effectiveness is severely limited by insufficient labeled training data. This paper introduces a novel GAN-based semi-supervised learning framework specifically designed for low labeled-data regimes, evaluated across settings with 5 to 50 labeled samples per class. Our approach integrates three specialized neural networks: a generator for class-conditioned image translation, a discriminator for authenticity assessment and classification, and a dedicated classifier, within a three-phase training framework. The method alternates between supervised training on limited labeled data and unsupervised learning that leverages abundant unlabeled images through image-to-image translation rather than generation from noise. We employ ensemble-based pseudo-labeling that combines confidence-weighted predictions from the discriminator and classifier with temporal consistency through exponential moving averaging, enabling reliable label estimation for unlabeled data. Comprehensive evaluation across eleven MedMNIST datasets demonstrates that our approach achieves statistically significant improvements over six state-of-the-art GAN-based semi-supervised methods, with particularly strong performance in the extreme 5-shot setting where the scarcity of labeled data is most challenging. The framework maintains its superiority across all evaluated settings (5, 10, 20, and 50 shots per class). Our approach offers a practical solution for medical imaging applications where annotation costs are prohibitive, enabling robust classification performance even with minimal labeled data. Code is available at <span><span>https://github.com/GuidoManni/SPARSE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102705"},"PeriodicalIF":4.9,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.compmedimag.2026.102703
Zhuoneng Zhang , Luyi Han , Dengqiang Jia , Tianyu Zhang , Zehui Lin , Kahou Chan , Jiaju Huang , Shaobin Chen , Xiangyu Xiong , Sio-Kei Im , Tao Tan , Yue Sun
Automatic and accurate segmentation of brain tumors from Magnetic Resonance Imaging (MRI) data holds significant promise for advancing clinical applicability. However, substantial challenges persist in algorithm development, particularly in scenarios where MRI sequences are incomplete or missing. Although recent automatic segmentation methods have demonstrated notable progress in addressing incomplete sequence scenarios, they often overlook the varying contributions of different MRI sequences to the final segmentation. To address this limitation, we propose a Learnable Sequence-Guided Adaptive Fusion Network (SGAFNet) for robust brain tumor segmentation under incomplete sequence scenarios. Our architecture features parallel encoder–decoders for sequence-specific feature extraction, enhanced by two novel components: (1) a Learned Sequence-Guided Weighted Average (SGWA) module, which adaptively fuses different sequence features by learning sequence-specific contribution factors based on embedded priors, and (2) a Sequence-Specific Attention (SSA) module, which establishes cross-sequence dependencies between available sequences feature and the fused features generated by the SGWA. Comprehensive experiments on the BraTS2018 and BraTS2020 datasets demonstrate that our framework achieves state-of-the-art performance in handling incomplete sequences scenarios compared to existing approaches, with ablation studies confirming the critical role of the proposed SGWA and SSA modules. Increased robustness to incomplete MRI acquisitions enhances clinical applicability, facilitating more consistent diagnostic workflows.
{"title":"SGAFNet: Robust brain tumor segmentation via learnable sequence-guided adaptive fusion in available MRI acquisitions","authors":"Zhuoneng Zhang , Luyi Han , Dengqiang Jia , Tianyu Zhang , Zehui Lin , Kahou Chan , Jiaju Huang , Shaobin Chen , Xiangyu Xiong , Sio-Kei Im , Tao Tan , Yue Sun","doi":"10.1016/j.compmedimag.2026.102703","DOIUrl":"10.1016/j.compmedimag.2026.102703","url":null,"abstract":"<div><div>Automatic and accurate segmentation of brain tumors from Magnetic Resonance Imaging (MRI) data holds significant promise for advancing clinical applicability. However, substantial challenges persist in algorithm development, particularly in scenarios where MRI sequences are incomplete or missing. Although recent automatic segmentation methods have demonstrated notable progress in addressing incomplete sequence scenarios, they often overlook the varying contributions of different MRI sequences to the final segmentation. To address this limitation, we propose a Learnable Sequence-Guided Adaptive Fusion Network (SGAFNet) for robust brain tumor segmentation under incomplete sequence scenarios. Our architecture features parallel encoder–decoders for sequence-specific feature extraction, enhanced by two novel components: (1) a Learned Sequence-Guided Weighted Average (SGWA) module, which adaptively fuses different sequence features by learning sequence-specific contribution factors based on embedded priors, and (2) a Sequence-Specific Attention (SSA) module, which establishes cross-sequence dependencies between available sequences feature and the fused features generated by the SGWA. Comprehensive experiments on the BraTS2018 and BraTS2020 datasets demonstrate that our framework achieves state-of-the-art performance in handling incomplete sequences scenarios compared to existing approaches, with ablation studies confirming the critical role of the proposed SGWA and SSA modules. Increased robustness to incomplete MRI acquisitions enhances clinical applicability, facilitating more consistent diagnostic workflows.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102703"},"PeriodicalIF":4.9,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06DOI: 10.1016/j.compmedimag.2026.102700
Benedetta Baldini , Giulia Rubiu , Marco Serafin , Marco Bologna , Giuseppe Maurizio Facchi , Giuseppe Baselli , Gianluca Martino Tartaglia
Cephalometric analysis is a widely adopted procedure for clinical decision support in orthodontics. It involves manual identification of predefined anatomical landmarks on three-dimensional cone beam CT scans, followed by the computation of linear and angular measurements. To reduce processing time and operator dependency, this study aimed to develop a light-weight deep learning (DL) model capable of automatically localizing 16 anatomically defined landmarks. To ensure model robustness and generalizability, the model was trained on a dataset of 350 manually annotated CBCT scans acquired from various imaging systems, covering a wide range of patient ages and skeletal classifications. The trained model is a V-net, optimized for practical use in clinical workflows. The model achieved a mean localization error of 1.95 ± 1.06 mm, which falls within the clinically acceptable threshold of 2 mm. Moreover, the predicted landmarks were used to calculate cephalometric measurements and compare with manually derived values. The resulting errors was −0.15 ± 0.95° for angular measurements and 0.20 ± 0.28 mm for linear ones, with Bland–Altman analysis demonstrating strong agreement and acceptable variability. These results suggest that automated measurements can reliably replace manual ones. Given the clinical relevance of cephalometric parameters - particularly the ANB angle, which is critical for skeletal classification and orthodontic treatment planning - this model represents a promising clinical decision support tool. Additionally, its low computational complexity enables fast prediction, with mean inference time lower than 32 s per scan, promoting its integration into routine clinical settings due to both technical feasibility and robustness across heterogeneous datasets.
{"title":"Automated 3D cephalometry: A lightweight V-net for landmark localization on CBCT","authors":"Benedetta Baldini , Giulia Rubiu , Marco Serafin , Marco Bologna , Giuseppe Maurizio Facchi , Giuseppe Baselli , Gianluca Martino Tartaglia","doi":"10.1016/j.compmedimag.2026.102700","DOIUrl":"10.1016/j.compmedimag.2026.102700","url":null,"abstract":"<div><div>Cephalometric analysis is a widely adopted procedure for clinical decision support in orthodontics. It involves manual identification of predefined anatomical landmarks on three-dimensional cone beam CT scans, followed by the computation of linear and angular measurements. To reduce processing time and operator dependency, this study aimed to develop a light-weight deep learning (DL) model capable of automatically localizing 16 anatomically defined landmarks. To ensure model robustness and generalizability, the model was trained on a dataset of 350 manually annotated CBCT scans acquired from various imaging systems, covering a wide range of patient ages and skeletal classifications. The trained model is a V-net, optimized for practical use in clinical workflows. The model achieved a mean localization error of 1.95 ± 1.06 mm, which falls within the clinically acceptable threshold of 2 mm. Moreover, the predicted landmarks were used to calculate cephalometric measurements and compare with manually derived values. The resulting errors was −0.15 ± 0.95° for angular measurements and 0.20 ± 0.28 mm for linear ones, with Bland–Altman analysis demonstrating strong agreement and acceptable variability. These results suggest that automated measurements can reliably replace manual ones. Given the clinical relevance of cephalometric parameters - particularly the ANB angle, which is critical for skeletal classification and orthodontic treatment planning - this model represents a promising clinical decision support tool. Additionally, its low computational complexity enables fast prediction, with mean inference time lower than 32 s per scan, promoting its integration into routine clinical settings due to both technical feasibility and robustness across heterogeneous datasets.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102700"},"PeriodicalIF":4.9,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05DOI: 10.1016/j.compmedimag.2026.102699
Kaixuan Zhang , Shuqi Dong , Peifeng Shi , Dingcan Hu , Geng Gao , Jinlin Yang , Tao Gan , Nini Rao
Survival prediction using whole slide images (WSIs) and bulk genes is a key task in computational pathology, essential for automated risk assessment and personalized treatment planning. While integrating WSIs with genomic features presents challenges due to inconsistent modality granularity, semantic disparity, and the lack of personalized fusion. We propose GenoPath-MCA, a novel multimodal framework that models dense cross-modal interactions between histopathology and gene expression data. A masked co-attention mechanism aligns features across modalities, and the Multimodal Masked Cross-Attention Module (M2CAM) jointly captures high-order image–gene and gene–gene relationships for enhanced semantic fusion. To address patient-level heterogeneity, we develop a Dynamic Modality Weight Adjustment Strategy (DMWAS) that adaptively modulates fusion weights based on the discriminative relevance of each modality. Additionally, an importance-guided patch selection strategy effectively filters redundant visual inputs, reducing computational cost while preserving critical context. Experiments on public multimodal cancer survival datasets demonstrate that GenoPath-MCA significantly outperforms existing methods in terms of concordance index and robustness. Visualizations of multimodal attention maps validate the biological interpretability and clinical potential of our approach.
{"title":"GenoPath-MCA: Multimodal masked cross-attention between genomics and pathology for survival prediction","authors":"Kaixuan Zhang , Shuqi Dong , Peifeng Shi , Dingcan Hu , Geng Gao , Jinlin Yang , Tao Gan , Nini Rao","doi":"10.1016/j.compmedimag.2026.102699","DOIUrl":"10.1016/j.compmedimag.2026.102699","url":null,"abstract":"<div><div>Survival prediction using whole slide images (WSIs) and bulk genes is a key task in computational pathology, essential for automated risk assessment and personalized treatment planning. While integrating WSIs with genomic features presents challenges due to inconsistent modality granularity, semantic disparity, and the lack of personalized fusion. We propose <strong>GenoPath-MCA</strong>, a novel multimodal framework that models dense cross-modal interactions between histopathology and gene expression data. A masked co-attention mechanism aligns features across modalities, and the Multimodal Masked Cross-Attention Module (<strong>M2CAM</strong>) jointly captures high-order image–gene and gene–gene relationships for enhanced semantic fusion. To address patient-level heterogeneity, we develop a Dynamic Modality Weight Adjustment Strategy (<strong>DMWAS</strong>) that adaptively modulates fusion weights based on the discriminative relevance of each modality. Additionally, an importance-guided patch selection strategy effectively filters redundant visual inputs, reducing computational cost while preserving critical context. Experiments on public multimodal cancer survival datasets demonstrate that GenoPath-MCA significantly outperforms existing methods in terms of concordance index and robustness. Visualizations of multimodal attention maps validate the biological interpretability and clinical potential of our approach.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102699"},"PeriodicalIF":4.9,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}