Pub Date : 2025-12-01Epub Date: 2025-11-10DOI: 10.1016/j.compmedimag.2025.102664
Fei Liu , Shiuan-Ni Liang , Mohamed Hisham Jaward , Huey Fang Ong , Huabin Wang , Alzheimer’s Disease Neuroimaging Initiative , Australian Imaging Biomarkers and Lifestyle flagship study of ageing
Accurate diagnosis and early prediction of Alzheimer’s disease (AD) often require multiple neuroimageing modalities, but in many cases, only one or two modalities are available. This missing modality hinders the accuracy of diagnosis and is a critical challenge in clinical practice. Multimodal knowledge distillation (KD) offers a promising solution by aligning complete knowledge from multimodal data with that of partial modalities. However, current methods focus on aligning high-level features, which limit their effectiveness due to insufficient transfer of reliable knowledge. In this work, we propose a novel Consistency Refinement-driven Multi-level Self-Attention Distillation framework (CRAD) for Early Alzheimer’s Progression Prediction, which enables the cross-modal transfer of more robust shallow knowledge with self-attention to refine features. We develop a multi-level distillation module to progressively distill cross-modal discriminating knowledge, enabling lightweight yet reliable knowledge transfer. Moreover, we design a novel self-attention distillation module (PF-CMAD) to transfer disease-relevant intermediate knowledge, which leverages feature self-similarity to capture cross-modal correlations without introducing trainable parameters, enabling interpretable and efficient distillation. We incorporate a consistency-evaluation-driven confidence regularization strategy within the distillation process. This strategy dynamically refines knowledge using adaptive distillation controllers that assess teacher confidence. Comprehensive experiments demonstrate that our method achieves superior accuracy and robust cross-dataset generalization performance using only MRI for AD diagnosis and early progression prediction. The code is available at https://github.com/LiuFei-AHU/CRAD.
{"title":"CRAD: Cognitive Aware Feature Refinement with Missing Modalities for Early Alzheimer’s Progression Prediction","authors":"Fei Liu , Shiuan-Ni Liang , Mohamed Hisham Jaward , Huey Fang Ong , Huabin Wang , Alzheimer’s Disease Neuroimaging Initiative , Australian Imaging Biomarkers and Lifestyle flagship study of ageing","doi":"10.1016/j.compmedimag.2025.102664","DOIUrl":"10.1016/j.compmedimag.2025.102664","url":null,"abstract":"<div><div>Accurate diagnosis and early prediction of Alzheimer’s disease (AD) often require multiple neuroimageing modalities, but in many cases, only one or two modalities are available. This missing modality hinders the accuracy of diagnosis and is a critical challenge in clinical practice. Multimodal knowledge distillation (KD) offers a promising solution by aligning complete knowledge from multimodal data with that of partial modalities. However, current methods focus on aligning high-level features, which limit their effectiveness due to insufficient transfer of reliable knowledge. In this work, we propose a novel Consistency Refinement-driven Multi-level Self-Attention Distillation framework (CRAD) for Early Alzheimer’s Progression Prediction, which enables the cross-modal transfer of more robust shallow knowledge with self-attention to refine features. We develop a multi-level distillation module to progressively distill cross-modal discriminating knowledge, enabling lightweight yet reliable knowledge transfer. Moreover, we design a novel self-attention distillation module (PF-CMAD) to transfer disease-relevant intermediate knowledge, which leverages feature self-similarity to capture cross-modal correlations without introducing trainable parameters, enabling interpretable and efficient distillation. We incorporate a consistency-evaluation-driven confidence regularization strategy within the distillation process. This strategy dynamically refines knowledge using adaptive distillation controllers that assess teacher confidence. Comprehensive experiments demonstrate that our method achieves superior accuracy and robust cross-dataset generalization performance using only MRI for AD diagnosis and early progression prediction. The code is available at <span><span>https://github.com/LiuFei-AHU/CRAD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102664"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-21DOI: 10.1016/j.compmedimag.2025.102655
Xuan Huang , Zhuang Ai , Chongyang She , Qi Li , Qihao Wei , Sha Xu , Yaping Lu , Fanxin Zeng
Diabetic retinopathy (DR) is a leading cause of blindness worldwide, yet current diagnosis relies on labor-intensive and subjective fundus image interpretation. Here we present a convolutional neural network-transformer fusion model (DR-CTFN) that integrates ConvNeXt and Swin Transformer algorithms with a lightweight attention block (LAB) to enhance feature extraction. To address dataset imbalance, we applied standardized preprocessing and extensive image augmentation. On the Kaggle EyePACS dataset, DR-CTFN outperformed ConvNeXt and Swin Transformer in accuracy by 3.14% and 8.39%, while also achieving a superior area under the curve (AUC) by 1% and 26.08%. External validation on APTOS 2019 Blindness Detection and a clinical DR dataset yielded accuracies of 84.45% and 85.31%, with AUC values of 95.22% and 95.79%, respectively. These results demonstrate that DR-CTFN enables rapid, robust, and precise DR detection, offering a scalable approach for early diagnosis and prevention of vision loss, thereby enhancing the quality of life for DR patients.
{"title":"A CNN-Transformer fusion network for Diabetic retinopathy image classification","authors":"Xuan Huang , Zhuang Ai , Chongyang She , Qi Li , Qihao Wei , Sha Xu , Yaping Lu , Fanxin Zeng","doi":"10.1016/j.compmedimag.2025.102655","DOIUrl":"10.1016/j.compmedimag.2025.102655","url":null,"abstract":"<div><div>Diabetic retinopathy (DR) is a leading cause of blindness worldwide, yet current diagnosis relies on labor-intensive and subjective fundus image interpretation. Here we present a convolutional neural network-transformer fusion model (DR-CTFN) that integrates ConvNeXt and Swin Transformer algorithms with a lightweight attention block (LAB) to enhance feature extraction. To address dataset imbalance, we applied standardized preprocessing and extensive image augmentation. On the Kaggle EyePACS dataset, DR-CTFN outperformed ConvNeXt and Swin Transformer in accuracy by 3.14% and 8.39%, while also achieving a superior area under the curve (AUC) by 1% and 26.08%. External validation on APTOS 2019 Blindness Detection and a clinical DR dataset yielded accuracies of 84.45% and 85.31%, with AUC values of 95.22% and 95.79%, respectively. These results demonstrate that DR-CTFN enables rapid, robust, and precise DR detection, offering a scalable approach for early diagnosis and prevention of vision loss, thereby enhancing the quality of life for DR patients.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102655"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145394925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-10DOI: 10.1016/j.compmedimag.2025.102654
Lishuang Guo , Haonan Zhang , Chenbin Ma
Ultrasound imaging, as an economical, efficient, and non-invasive diagnostic tool, is widely used for breast lesion screening and diagnosis. However, the segmentation of lesion regions remains a significant challenge due to factors such as noise interference and the variability in image quality. To address this issue, we propose a novel deep learning model named enhanced segment anything model 2 (SAM2) for breast lesion segmentation (ESAM2-BLS). This model is an optimized version of the SAM2 architecture. ESAM2-BLS customizes and fine-tunes the pre-trained SAM2 model by introducing an adapter module, specifically designed to accommodate the unique characteristics of breast ultrasound images. The adapter module directly addresses ultrasound-specific challenges including speckle noise, low contrast boundaries, shadowing artifacts, and anisotropic resolution through targeted architectural elements such as channel attention mechanisms, specialized convolution kernels, and optimized skip connections. This optimization significantly improves segmentation accuracy, particularly for low-contrast and small lesion regions. Compared to traditional methods, ESAM2-BLS fully leverages the generalization capabilities of large models while incorporating multi-scale feature fusion and axial dilated depthwise convolution to effectively capture multi-level information from complex lesions. During the decoding process, the model enhances the identification of fine boundaries and small lesions through depthwise separable convolutions and skip connections, while maintaining a low computational cost. Visualization of the segmentation results and interpretability analysis demonstrate that ESAM2-BLS achieves an average Dice score of 0.9077 and 0.8633 in five-fold cross-validation across two datasets with over 1600 patients. These results significantly improve segmentation accuracy and robustness. This model provides an efficient, reliable, and specialized automated solution for early breast cancer screening and diagnosis.
{"title":"ESAM2-BLS: Enhanced segment anything model 2 for efficient breast lesion segmentation in ultrasound imaging","authors":"Lishuang Guo , Haonan Zhang , Chenbin Ma","doi":"10.1016/j.compmedimag.2025.102654","DOIUrl":"10.1016/j.compmedimag.2025.102654","url":null,"abstract":"<div><div>Ultrasound imaging, as an economical, efficient, and non-invasive diagnostic tool, is widely used for breast lesion screening and diagnosis. However, the segmentation of lesion regions remains a significant challenge due to factors such as noise interference and the variability in image quality. To address this issue, we propose a novel deep learning model named enhanced segment anything model 2 (SAM2) for breast lesion segmentation (ESAM2-BLS). This model is an optimized version of the SAM2 architecture. ESAM2-BLS customizes and fine-tunes the pre-trained SAM2 model by introducing an adapter module, specifically designed to accommodate the unique characteristics of breast ultrasound images. The adapter module directly addresses ultrasound-specific challenges including speckle noise, low contrast boundaries, shadowing artifacts, and anisotropic resolution through targeted architectural elements such as channel attention mechanisms, specialized convolution kernels, and optimized skip connections. This optimization significantly improves segmentation accuracy, particularly for low-contrast and small lesion regions. Compared to traditional methods, ESAM2-BLS fully leverages the generalization capabilities of large models while incorporating multi-scale feature fusion and axial dilated depthwise convolution to effectively capture multi-level information from complex lesions. During the decoding process, the model enhances the identification of fine boundaries and small lesions through depthwise separable convolutions and skip connections, while maintaining a low computational cost. Visualization of the segmentation results and interpretability analysis demonstrate that ESAM2-BLS achieves an average Dice score of 0.9077 and 0.8633 in five-fold cross-validation across two datasets with over 1600 patients. These results significantly improve segmentation accuracy and robustness. This model provides an efficient, reliable, and specialized automated solution for early breast cancer screening and diagnosis.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102654"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145356747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-22DOI: 10.1016/j.compmedimag.2025.102658
Nicola Altini , Michela Prunella , Surya V. Seshan , Savino Sciascia , Antonella Barreca , Alessandro Del Gobbo , Stefan Porubsky , Hien Van Nguyen , Claudia Delprete , Berardino Prencipe , Deján Dobi , Daan P.C. van Doorn , Sjoerd A.M.E.G. Timmermans , Pieter van Paassen , Vitoantonio Bevilacqua , Jan Ulrich Becker
Automatic tissue segmentation is a necessary step for the bulk analysis of whole slide images (WSIs) from paraffin histology sections in kidney biopsies. However, existing models often fail to generalize across the main nephropathological staining methods and to capture the severe morphological distortions in arteries, arterioles, and glomeruli common in thrombotic microangiopathy (TMA) or other vasculopathies. Therefore, we developed an automatic multi-staining segmentation pipeline covering six key compartments: Artery, Arteriole, Glomerulus, Cortex, Medulla, and Capsule/Other. This framework enables downstream tasks such as counting and labeling at instance-, WSI- or biopsy-level. Biopsies (n = 158) from seven centers: Cologne, Turin, Milan, Weill-Cornell, Mainz, Maastricht, Budapest, were classified by expert nephropathologists into TMA (n = 87) or Mimickers (n = 71). Ground truth expert segmentation masks were provided for all compartments, and expert binary TMA classification labels for Glomerulus, Artery, Arteriole. The biopsies were divided into training (n = 79), validation (n = 26), and test (n = 53) subsets. We benchmarked six deep learning models for semantic segmentation (U-Net, FPN, DeepLabV3+, Mask2Former, SegFormer, SegNeXt) and five models for classification (ResNet-34, DenseNet-121, EfficientNet-v2-S, ConvNeXt-Small, Swin-v2-B). We obtained robust segmentation results across all compartments. On the test set, the best models achieved Dice coefficients of 0.903 (Cortex), 0.834 (Medulla), 0.816 (Capsule/Other), 0.922 (Glomerulus), 0.822 (Artery), and 0.553 (Arteriole). The best classification models achieved Accuracy of 0.724 and 0.841 for Glomerulus and Artery plus Arteriole compartments, respectively. Furthermore, we release NePathTK (NephroPathology Toolkit), a powerful open-source end-to-end pipeline integrated with QuPath, enabling accurate segmentation for decision support in nephropathology and large-scale analysis of kidney biopsies.
{"title":"Multistain multicompartment automatic segmentation in renal biopsies with thrombotic microangiopathies and other vasculopathies","authors":"Nicola Altini , Michela Prunella , Surya V. Seshan , Savino Sciascia , Antonella Barreca , Alessandro Del Gobbo , Stefan Porubsky , Hien Van Nguyen , Claudia Delprete , Berardino Prencipe , Deján Dobi , Daan P.C. van Doorn , Sjoerd A.M.E.G. Timmermans , Pieter van Paassen , Vitoantonio Bevilacqua , Jan Ulrich Becker","doi":"10.1016/j.compmedimag.2025.102658","DOIUrl":"10.1016/j.compmedimag.2025.102658","url":null,"abstract":"<div><div>Automatic tissue segmentation is a necessary step for the bulk analysis of whole slide images (WSIs) from paraffin histology sections in kidney biopsies. However, existing models often fail to generalize across the main nephropathological staining methods and to capture the severe morphological distortions in arteries, arterioles, and glomeruli common in thrombotic microangiopathy (TMA) or other vasculopathies. Therefore, we developed an automatic multi-staining segmentation pipeline covering six key compartments: Artery, Arteriole, Glomerulus, Cortex, Medulla, and Capsule/Other. This framework enables downstream tasks such as counting and labeling at instance-, WSI- or biopsy-level. Biopsies (n = 158) from seven centers: Cologne, Turin, Milan, Weill-Cornell, Mainz, Maastricht, Budapest, were classified by expert nephropathologists into TMA (n = 87) or Mimickers (n = 71). Ground truth expert segmentation masks were provided for all compartments, and expert binary TMA classification labels for Glomerulus, Artery, Arteriole. The biopsies were divided into training (n = 79), validation (n = 26), and test (n = 53) subsets. We benchmarked six deep learning models for semantic segmentation (U-Net, FPN, DeepLabV3+, Mask2Former, SegFormer, SegNeXt) and five models for classification (ResNet-34, DenseNet-121, EfficientNet-v2-S, ConvNeXt-Small, Swin-v2-B). We obtained robust segmentation results across all compartments. On the test set, the best models achieved Dice coefficients of 0.903 (Cortex), 0.834 (Medulla), 0.816 (Capsule/Other), 0.922 (Glomerulus), 0.822 (Artery), and 0.553 (Arteriole). The best classification models achieved Accuracy of 0.724 and 0.841 for Glomerulus and Artery plus Arteriole compartments, respectively. Furthermore, we release NePathTK (NephroPathology Toolkit), a powerful open-source end-to-end pipeline integrated with QuPath, enabling accurate segmentation for decision support in nephropathology and large-scale analysis of kidney biopsies.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102658"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145410649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-03DOI: 10.1016/j.compmedimag.2025.102660
Jiaao Li , Diandian Guo , Youyu Wang , Yanhui Wan , Long Ma , Jialun Pei
Surgical image restoration plays a vital clinical role in improving visual quality during surgery, particularly in minimally invasive procedures where the operating field is frequently obscured by surgical smoke. However, surgical image desmoking still has limited progress in algorithm development and customized learning strategies. In this regard, this work focuses on the task of desmoking from both theoretical and practical perspectives. First, we analyze the intrinsic characteristics of surgical smoke degradation: (1) spatial localization and dynamics, (2) distinguishable frequency-domain patterns, and (3) the entangled representation of anatomical content and degradative artifacts. These observations motivated us to propose an efficient frequency-aware Transformer framework, namely SmoRestor, which aims to separate and restore true anatomical structures from complex degradations. Specifically, we introduce a high-order Fourier-embedded neighborhood attention transformer that enhances the model’s ability to capture structured degradation patterns across both spatial and frequency domains. Besides, we utilize the semantic priors encoded by large vision models to disambiguate content from degradation through targeted guidance. Moreover, we propose an innovative transfer learning paradigm that injects knowledge from large models to the main network, enabling it to effectively distinguish meaningful content from ambiguous corruption. Experimental results on both public and in-house datasets demonstrate substantial improvements in quantitative performance and visual quality. The source code will be available.
{"title":"Efficient frequency-decomposed transformer via large vision model guidance for surgical image desmoking","authors":"Jiaao Li , Diandian Guo , Youyu Wang , Yanhui Wan , Long Ma , Jialun Pei","doi":"10.1016/j.compmedimag.2025.102660","DOIUrl":"10.1016/j.compmedimag.2025.102660","url":null,"abstract":"<div><div>Surgical image restoration plays a vital clinical role in improving visual quality during surgery, particularly in minimally invasive procedures where the operating field is frequently obscured by surgical smoke. However, surgical image desmoking still has limited progress in algorithm development and customized learning strategies. In this regard, this work focuses on the task of desmoking from both theoretical and practical perspectives. First, we analyze the intrinsic characteristics of surgical smoke degradation: (1) spatial localization and dynamics, (2) distinguishable frequency-domain patterns, and (3) the entangled representation of anatomical content and degradative artifacts. These observations motivated us to propose an efficient frequency-aware Transformer framework, namely SmoRestor, which aims to separate and restore true anatomical structures from complex degradations. Specifically, we introduce a high-order Fourier-embedded neighborhood attention transformer that enhances the model’s ability to capture structured degradation patterns across both spatial and frequency domains. Besides, we utilize the semantic priors encoded by large vision models to disambiguate content from degradation through targeted guidance. Moreover, we propose an innovative transfer learning paradigm that injects knowledge from large models to the main network, enabling it to effectively distinguish meaningful content from ambiguous corruption. Experimental results on both public and in-house datasets demonstrate substantial improvements in quantitative performance and visual quality. The source code will be available.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102660"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-13DOI: 10.1016/j.compmedimag.2025.102656
Teng Zhou , Jax Luo , Yuping Sun , Yiheng Tan , Shun Yao , Nazim Haouchine , Scott Raymond
Accurate MRI-to-CT translation promises the integration of complementary imaging information without the need for additional imaging sessions. Given the practical challenges associated with acquiring paired MRI and CT scans, the development of robust methods capable of leveraging unpaired datasets is essential for advancing the MRI-to-CT translation. Current unpaired MRI-to-CT translation methods, which predominantly rely on cycle consistency and contrastive learning frameworks, frequently encounter challenges in accurately translating anatomical features that are highly discernible on CT but less distinguishable on MRI, such as bone structures. This limitation renders these approaches less suitable for applications in radiation therapy, where precise bone representation is essential for accurate treatment planning. To address this challenge, we propose a path- and bone-contour regularized approach for unpaired MRI-to-CT translation. In our method, MRI and CT images are projected to a shared latent space, where the MRI-to-CT mapping is modeled as a continuous flow governed by neural ordinary differential equations. The optimal mapping is obtained by minimizing the transition path length of the flow. To enhance the accuracy of translated bone structures, we introduce a trainable neural network to generate bone contours from MRI and implement mechanisms to directly and indirectly encourage the model to focus on bone contours and their adjacent regions. Evaluations conducted on three datasets demonstrate that our method outperforms existing unpaired MRI-to-CT translation approaches, achieving lower overall error rates. Moreover, in a downstream bone segmentation task, our approach exhibits superior performance in preserving the fidelity of bone structures. Our code is available at: https://github.com/kennysyp/PaBoT.
{"title":"Path and bone-contour regularized unpaired MRI-to-CT translation","authors":"Teng Zhou , Jax Luo , Yuping Sun , Yiheng Tan , Shun Yao , Nazim Haouchine , Scott Raymond","doi":"10.1016/j.compmedimag.2025.102656","DOIUrl":"10.1016/j.compmedimag.2025.102656","url":null,"abstract":"<div><div>Accurate MRI-to-CT translation promises the integration of complementary imaging information without the need for additional imaging sessions. Given the practical challenges associated with acquiring paired MRI and CT scans, the development of robust methods capable of leveraging unpaired datasets is essential for advancing the MRI-to-CT translation. Current unpaired MRI-to-CT translation methods, which predominantly rely on cycle consistency and contrastive learning frameworks, frequently encounter challenges in accurately translating anatomical features that are highly discernible on CT but less distinguishable on MRI, such as bone structures. This limitation renders these approaches less suitable for applications in radiation therapy, where precise bone representation is essential for accurate treatment planning. To address this challenge, we propose a path- and bone-contour regularized approach for unpaired MRI-to-CT translation. In our method, MRI and CT images are projected to a shared latent space, where the MRI-to-CT mapping is modeled as a continuous flow governed by neural ordinary differential equations. The optimal mapping is obtained by minimizing the transition path length of the flow. To enhance the accuracy of translated bone structures, we introduce a trainable neural network to generate bone contours from MRI and implement mechanisms to directly and indirectly encourage the model to focus on bone contours and their adjacent regions. Evaluations conducted on three datasets demonstrate that our method outperforms existing unpaired MRI-to-CT translation approaches, achieving lower overall error rates. Moreover, in a downstream bone segmentation task, our approach exhibits superior performance in preserving the fidelity of bone structures. Our code is available at: <span><span>https://github.com/kennysyp/PaBoT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102656"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145290008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-15DOI: 10.1016/j.compmedimag.2025.102669
Abdulfattah Ba Alawi , Abdullah Ammar Karcioglu , Ferhat Bozkurt
Colorectal cancer constitutes a significant proportion of global cancer-related mortality, underscoring the imperative for robust and early-stage diagnostic methodologies. In this study, we propose a novel end-to-end deep learning framework that integrates multiple advanced mechanisms to enhance the classification of colorectal disease from histopathologic and endoscopic images. Our model, named TripleFusionNet, leverages a unique triple-stream architecture by combining the strengths of EfficientNetB3, ResNet50, and DenseNet121, enabling the extraction of rich, multi-level feature representations from input images. To augment discriminative feature modeling, a Multi-Scale Attention Module is integrated, which concurrently performs spatial and channel-wise recalibration, thereby enabling the network to emphasize diagnostically salient regions. Additionally, we incorporate a Squeeze-Excite Refinement Block (SERB) to selectively enhance informative channel activations while attenuating noise and redundant signals. Feature representations from the individual backbones are adaptively fused through a Progressive Gated Fusion mechanism that dynamically learns context-aware weighting for optimal feature integration and redundancy mitigation. We validate our approach on two colorectal benchmarks: CRCCD_V1 (14 classes) and LC25000 (binary). On CRCCD_V1, the best performance is obtained by a conventional classifier trained on our 256-D TripleFusionNet embeddings—SVM (RBF) reaches 96.63% test accuracy with macro F1 96.62%, with the Stacking Ensemble close behind. With five-fold cross-validation, it yields comparable out-of-fold means (0.964 with small standard deviations), confirming stability across partitions. End-to-end image-based baselines, including TripleFusionNet, are competitive but are slightly surpassed by embedding-based classifiers, highlighting the utility of the learned representation. On LC25000, our method attains 100% accuracy. Beyond accuracy, the approach maintains strong precision, recall, F1, and ROC–AUC, and the fused embeddings transfer effectively to multiple conventional learners (e.g., Random Forest, XGBoost). These results confirm the potential of the model for real-world deployment in computer-aided diagnosis workflows, particularly within resource-constrained clinical settings.
{"title":"Colorectal disease diagnosis with deep triple-stream fusion and attention refinement","authors":"Abdulfattah Ba Alawi , Abdullah Ammar Karcioglu , Ferhat Bozkurt","doi":"10.1016/j.compmedimag.2025.102669","DOIUrl":"10.1016/j.compmedimag.2025.102669","url":null,"abstract":"<div><div>Colorectal cancer constitutes a significant proportion of global cancer-related mortality, underscoring the imperative for robust and early-stage diagnostic methodologies. In this study, we propose a novel end-to-end deep learning framework that integrates multiple advanced mechanisms to enhance the classification of colorectal disease from histopathologic and endoscopic images. Our model, named <strong>TripleFusionNet</strong>, leverages a unique triple-stream architecture by combining the strengths of EfficientNetB3, ResNet50, and DenseNet121, enabling the extraction of rich, multi-level feature representations from input images. To augment discriminative feature modeling, a <em>Multi-Scale Attention Module</em> is integrated, which concurrently performs spatial and channel-wise recalibration, thereby enabling the network to emphasize diagnostically salient regions. Additionally, we incorporate a <em>Squeeze-Excite Refinement Block (SERB)</em> to selectively enhance informative channel activations while attenuating noise and redundant signals. Feature representations from the individual backbones are adaptively fused through a <em>Progressive Gated Fusion mechanism</em> that dynamically learns context-aware weighting for optimal feature integration and redundancy mitigation. We validate our approach on two colorectal benchmarks: CRCCD_V1 (14 classes) and LC25000 (binary). On CRCCD_V1, the best performance is obtained by a conventional classifier trained on our 256-D <em>TripleFusionNet</em> embeddings—SVM (RBF) reaches <strong>96.63%</strong> test accuracy with macro F1 <strong>96.62%</strong>, with the Stacking Ensemble close behind. With five-fold cross-validation, it yields comparable out-of-fold means (<strong>0.964</strong> with small standard deviations), confirming stability across partitions. End-to-end image-based baselines, including <em>TripleFusionNet</em>, are competitive but are slightly surpassed by embedding-based classifiers, highlighting the utility of the learned representation. On LC25000, our method attains <strong>100%</strong> accuracy. Beyond accuracy, the approach maintains strong precision, recall, F1, and ROC–AUC, and the fused embeddings transfer effectively to multiple conventional learners (e.g., Random Forest, XGBoost). These results confirm the potential of the model for real-world deployment in computer-aided diagnosis workflows, particularly within resource-constrained clinical settings.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102669"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-11DOI: 10.1016/j.compmedimag.2025.102666
Thanh-Huy Nguyen , Hoang-Thien Nguyen , Vi Vu , Ba-Thinh Lam , Phat Huynh , Tianyang Wang , Xingjian Li , Ulas Bagci , Min Xu
The limited availability of annotated data in medical imaging makes semi-supervised learning increasingly appealing for its ability to learn from imperfect supervision. Recently, teacher-student frameworks have gained popularity for their training benefits and robust performance. However, jointly optimizing the entire network can hinder convergence and stability, especially in challenging scenarios. To address this for medical image segmentation, we propose DuetMatch, a novel dual-branch semi-supervised framework with asynchronous optimization, where each branch optimizes either the encoder or decoder while keeping the other frozen. To improve consistency under noisy conditions, we introduce Decoupled Dropout Perturbation, enforcing regularization across branches. We also design Pairwise CutMix Cross-Guidance to enhance model diversity by exchanging pseudo-labels through augmented input pairs. To mitigate confirmation bias from noisy pseudo-labels, we propose Consistency Matching, refining labels using stable predictions from frozen teacher models. Extensive experiments on benchmark brain MRI segmentation datasets, including ISLES2022 and BraTS, show that DuetMatch consistently outperforms state-of-the-art methods, demonstrating its effectiveness and robustness across diverse semi-supervised segmentation scenarios.
{"title":"DuetMatch: Harmonizing semi-supervised brain MRI segmentation via decoupled branch optimization","authors":"Thanh-Huy Nguyen , Hoang-Thien Nguyen , Vi Vu , Ba-Thinh Lam , Phat Huynh , Tianyang Wang , Xingjian Li , Ulas Bagci , Min Xu","doi":"10.1016/j.compmedimag.2025.102666","DOIUrl":"10.1016/j.compmedimag.2025.102666","url":null,"abstract":"<div><div>The limited availability of annotated data in medical imaging makes semi-supervised learning increasingly appealing for its ability to learn from imperfect supervision. Recently, teacher-student frameworks have gained popularity for their training benefits and robust performance. However, jointly optimizing the entire network can hinder convergence and stability, especially in challenging scenarios. To address this for medical image segmentation, we propose <em>DuetMatch</em>, a novel dual-branch semi-supervised framework with asynchronous optimization, where each branch optimizes either the encoder or decoder while keeping the other frozen. To improve consistency under noisy conditions, we introduce <strong>Decoupled Dropout Perturbation</strong>, enforcing regularization across branches. We also design <strong>Pairwise CutMix Cross-Guidance</strong> to enhance model diversity by exchanging pseudo-labels through augmented input pairs. To mitigate confirmation bias from noisy pseudo-labels, we propose <strong>Consistency Matching</strong>, refining labels using stable predictions from frozen teacher models. Extensive experiments on benchmark brain MRI segmentation datasets, including ISLES2022 and BraTS, show that DuetMatch consistently outperforms state-of-the-art methods, demonstrating its effectiveness and robustness across diverse semi-supervised segmentation scenarios.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102666"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-14DOI: 10.1016/j.compmedimag.2025.102667
Georgii Kolokolnikov , Marie-Lena Schmalhofer , Lennart Well , Said Farschtschi , Victor-Felix Mautner , Inka Ristow , René Werner
Background and Objectives:
Neurofibromatosis type 1 (NF1) is a genetic disorder characterized by the development of multiple neurofibromas (NFs) throughout the body. Accurate segmentation of these tumors in whole-body magnetic resonance imaging (WB-MRI) is critical for quantifying tumor burden and clinical decision-making. This study aims to develop a pipeline for NF segmentation in fat-suppressed T2-weighted WB-MRI that incorporates anatomical context and radiomics to improve accuracy and specificity.
Methods:
The proposed pipeline consists of three stages: (1) anatomy segmentation using MRSegmentator and refinement with a high-risk NF zone; (2) NF segmentation using an ensemble of 3D anisotropic anatomy-informed U-Nets; and (3) tumor candidate classification using radiomic features to filter false positives. The study used 109 WB-MRI scans from 74 NF1 patients, divided into training and three test sets representing in-domain (3T), domain-shifted (1.5T), and low tumor burden scenarios. Evaluation metrics included per-scan and per-tumor Dice Similarity Coefficient (DSC), Volume Overlap Error (VOE), Absolute Relative Volume Difference (ARVD), and per-scan F1 score. Statistical significance was assessed using Wilcoxon signed-rank tests with Bonferroni correction.
Results:
On the in-domain test set, the proposed ensemble of 3D anisotropic anatomy-informed U-Nets with tumor candidate classification achieved a per-scan DSC of 0.64, outperforming 2D nnU-Net (DSC: 0.52) and 3D full-resolution nnU-Net (DSC: 0.54). Performance was maintained on the domain-shift test set (DSC: 0.51) but declined on low tumor burden cases (DSC: 0.23). Preliminary inter-reader variability analysis showed model-to-expert agreement (DSC: 0.67–0.69) comparable to inter-expert agreement (DSC: 0.69).
Conclusions:
The proposed pipeline achieves the highest performance among established methods for automated NF segmentation in WB-MRI and approaches expert-level consistency. The integration of anatomical context and radiomics enhances robustness. Nonetheless, segmentation performance decreases in low tumor burden scenarios, indicating a key area for future methodological improvements. Additionally, the limited inter-reader agreement observed among experts underscores the inherent complexity and ambiguity of the NF segmentation task.
{"title":"Anatomy-informed deep learning and radiomics for neurofibroma segmentation in whole-body MRI","authors":"Georgii Kolokolnikov , Marie-Lena Schmalhofer , Lennart Well , Said Farschtschi , Victor-Felix Mautner , Inka Ristow , René Werner","doi":"10.1016/j.compmedimag.2025.102667","DOIUrl":"10.1016/j.compmedimag.2025.102667","url":null,"abstract":"<div><h3>Background and Objectives:</h3><div>Neurofibromatosis type 1 (NF1) is a genetic disorder characterized by the development of multiple neurofibromas (NFs) throughout the body. Accurate segmentation of these tumors in whole-body magnetic resonance imaging (WB-MRI) is critical for quantifying tumor burden and clinical decision-making. This study aims to develop a pipeline for NF segmentation in fat-suppressed T2-weighted WB-MRI that incorporates anatomical context and radiomics to improve accuracy and specificity.</div></div><div><h3>Methods:</h3><div>The proposed pipeline consists of three stages: (1) anatomy segmentation using MRSegmentator and refinement with a high-risk NF zone; (2) NF segmentation using an ensemble of 3D anisotropic anatomy-informed U-Nets; and (3) tumor candidate classification using radiomic features to filter false positives. The study used 109 WB-MRI scans from 74 NF1 patients, divided into training and three test sets representing in-domain (3T), domain-shifted (1.5T), and low tumor burden scenarios. Evaluation metrics included per-scan and per-tumor Dice Similarity Coefficient (DSC), Volume Overlap Error (VOE), Absolute Relative Volume Difference (ARVD), and per-scan F1 score. Statistical significance was assessed using Wilcoxon signed-rank tests with Bonferroni correction.</div></div><div><h3>Results:</h3><div>On the in-domain test set, the proposed ensemble of 3D anisotropic anatomy-informed U-Nets with tumor candidate classification achieved a per-scan DSC of 0.64, outperforming 2D nnU-Net (DSC: 0.52) and 3D full-resolution nnU-Net (DSC: 0.54). Performance was maintained on the domain-shift test set (DSC: 0.51) but declined on low tumor burden cases (DSC: 0.23). Preliminary inter-reader variability analysis showed model-to-expert agreement (DSC: 0.67–0.69) comparable to inter-expert agreement (DSC: 0.69).</div></div><div><h3>Conclusions:</h3><div>The proposed pipeline achieves the highest performance among established methods for automated NF segmentation in WB-MRI and approaches expert-level consistency. The integration of anatomical context and radiomics enhances robustness. Nonetheless, segmentation performance decreases in low tumor burden scenarios, indicating a key area for future methodological improvements. Additionally, the limited inter-reader agreement observed among experts underscores the inherent complexity and ambiguity of the NF segmentation task.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102667"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to variations in medical image acquisition protocols, segmentation models often exhibit degraded performance when applied to unseen domains. We argue that such degradation primarily stems from overfitting to source domains and insufficient dynamic adaptability to target domains. To address this issue, we propose a hallucinated domain generalization network with domain-aware dynamic representation for medical image segmentation, which introduces a novel ”hallucination during training, dynamic representation during testing” scheme to effectively improve generalization. Specifically, we design an uncertainty-aware dynamic hallucination module that achieves adaptive transformation through Bézier curves and estimates potential domain shift by introducing the uncertainty-aware offset variable driven by channel-wise variance, generating diverse synthetic images. This approach breaks the limitations of source domain distributions while preserving original anatomical structures, effectively alleviating the model’s overfitting to the specific styles of source domains. Furthermore, we develop a domain-aware dynamic representation module that treats source domain knowledge as a foundation for understanding unknown domains. Concretely, we obtain unbiased estimates of global style prototypes through domain-wise statistical aggregation and the momentum update strategy. Then, input features are mapped to the unified source domain space through global style prototypes and similarity weights, mitigating performance degradation caused by domain shift during the testing phase. Extensive experiments on four heterogeneously distributed fundus image datasets and six multi-center prostate MRI datasets demonstrate that our approach outperforms state-of-the-art methods.
{"title":"Hallucinated domain generalization network with domain-aware dynamic representation for medical image segmentation","authors":"Minjun Wang, Houjin Chen, Yanfeng Li, Jia Sun, Luyifu Chen, Peng Liang","doi":"10.1016/j.compmedimag.2025.102670","DOIUrl":"10.1016/j.compmedimag.2025.102670","url":null,"abstract":"<div><div>Due to variations in medical image acquisition protocols, segmentation models often exhibit degraded performance when applied to unseen domains. We argue that such degradation primarily stems from overfitting to source domains and insufficient dynamic adaptability to target domains. To address this issue, we propose a hallucinated domain generalization network with domain-aware dynamic representation for medical image segmentation, which introduces a novel ”hallucination during training, dynamic representation during testing” scheme to effectively improve generalization. Specifically, we design an uncertainty-aware dynamic hallucination module that achieves adaptive transformation through Bézier curves and estimates potential domain shift by introducing the uncertainty-aware offset variable driven by channel-wise variance, generating diverse synthetic images. This approach breaks the limitations of source domain distributions while preserving original anatomical structures, effectively alleviating the model’s overfitting to the specific styles of source domains. Furthermore, we develop a domain-aware dynamic representation module that treats source domain knowledge as a foundation for understanding unknown domains. Concretely, we obtain unbiased estimates of global style prototypes through domain-wise statistical aggregation and the momentum update strategy. Then, input features are mapped to the unified source domain space through global style prototypes and similarity weights, mitigating performance degradation caused by domain shift during the testing phase. Extensive experiments on four heterogeneously distributed fundus image datasets and six multi-center prostate MRI datasets demonstrate that our approach outperforms state-of-the-art methods.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102670"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}