Pub Date : 2025-11-03DOI: 10.1016/j.compmedimag.2025.102660
Jiaao Li , Diandian Guo , Youyu Wang , Yanhui Wan , Long Ma , Jialun Pei
Surgical image restoration plays a vital clinical role in improving visual quality during surgery, particularly in minimally invasive procedures where the operating field is frequently obscured by surgical smoke. However, surgical image desmoking still has limited progress in algorithm development and customized learning strategies. In this regard, this work focuses on the task of desmoking from both theoretical and practical perspectives. First, we analyze the intrinsic characteristics of surgical smoke degradation: (1) spatial localization and dynamics, (2) distinguishable frequency-domain patterns, and (3) the entangled representation of anatomical content and degradative artifacts. These observations motivated us to propose an efficient frequency-aware Transformer framework, namely SmoRestor, which aims to separate and restore true anatomical structures from complex degradations. Specifically, we introduce a high-order Fourier-embedded neighborhood attention transformer that enhances the model’s ability to capture structured degradation patterns across both spatial and frequency domains. Besides, we utilize the semantic priors encoded by large vision models to disambiguate content from degradation through targeted guidance. Moreover, we propose an innovative transfer learning paradigm that injects knowledge from large models to the main network, enabling it to effectively distinguish meaningful content from ambiguous corruption. Experimental results on both public and in-house datasets demonstrate substantial improvements in quantitative performance and visual quality. The source code will be available.
{"title":"Efficient frequency-decomposed transformer via large vision model guidance for surgical image desmoking","authors":"Jiaao Li , Diandian Guo , Youyu Wang , Yanhui Wan , Long Ma , Jialun Pei","doi":"10.1016/j.compmedimag.2025.102660","DOIUrl":"10.1016/j.compmedimag.2025.102660","url":null,"abstract":"<div><div>Surgical image restoration plays a vital clinical role in improving visual quality during surgery, particularly in minimally invasive procedures where the operating field is frequently obscured by surgical smoke. However, surgical image desmoking still has limited progress in algorithm development and customized learning strategies. In this regard, this work focuses on the task of desmoking from both theoretical and practical perspectives. First, we analyze the intrinsic characteristics of surgical smoke degradation: (1) spatial localization and dynamics, (2) distinguishable frequency-domain patterns, and (3) the entangled representation of anatomical content and degradative artifacts. These observations motivated us to propose an efficient frequency-aware Transformer framework, namely SmoRestor, which aims to separate and restore true anatomical structures from complex degradations. Specifically, we introduce a high-order Fourier-embedded neighborhood attention transformer that enhances the model’s ability to capture structured degradation patterns across both spatial and frequency domains. Besides, we utilize the semantic priors encoded by large vision models to disambiguate content from degradation through targeted guidance. Moreover, we propose an innovative transfer learning paradigm that injects knowledge from large models to the main network, enabling it to effectively distinguish meaningful content from ambiguous corruption. Experimental results on both public and in-house datasets demonstrate substantial improvements in quantitative performance and visual quality. The source code will be available.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102660"},"PeriodicalIF":4.9,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.compmedimag.2025.102659
Jinhao Qiao , Sihan Li , Jiang Liu , Heng Yu , Yi Xiao , Hongshan Yu , Yan Zheng
Most existing medical visual question answering (Med-VQA) methods emphasize answer accuracy while neglecting the reasoning process, limiting interpretability and reliability in clinical settings. To address this issue, we introduce Med-SCoT, a vision-language model that performs structured chain-of-thought (SCoT) reasoning by explicitly decomposing inference into four stages: Summary, Caption, Reasoning, and Conclusion. To facilitate training, we propose a multi-model collaborative correction (CoCo) annotation pipeline and construct three Med-VQA datasets with structured reasoning chains. We further develop SCoTEval, a comprehensive evaluation framework combining metric-based scores and large language model (LLM) assessments to enable fine-grained analysis of reasoning quality. Experimental results demonstrate that Med-SCoT achieves advanced answer accuracy while generating structured, clinically aligned and logically coherent reasoning chains. Moreover, SCoTEval exhibits high agreement with expert judgments, validating its reliability for structured reasoning assessment. The code, data, and models are available at: https://github.com/qiaodongxing/Med-SCoT.
{"title":"Med-SCoT: Structured chain-of-thought reasoning and evaluation for enhancing interpretability in medical visual question answering","authors":"Jinhao Qiao , Sihan Li , Jiang Liu , Heng Yu , Yi Xiao , Hongshan Yu , Yan Zheng","doi":"10.1016/j.compmedimag.2025.102659","DOIUrl":"10.1016/j.compmedimag.2025.102659","url":null,"abstract":"<div><div>Most existing medical visual question answering (Med-VQA) methods emphasize answer accuracy while neglecting the reasoning process, limiting interpretability and reliability in clinical settings. To address this issue, we introduce Med-SCoT, a vision-language model that performs structured chain-of-thought (SCoT) reasoning by explicitly decomposing inference into four stages: Summary, Caption, Reasoning, and Conclusion. To facilitate training, we propose a multi-model collaborative correction (CoCo) annotation pipeline and construct three Med-VQA datasets with structured reasoning chains. We further develop SCoTEval, a comprehensive evaluation framework combining metric-based scores and large language model (LLM) assessments to enable fine-grained analysis of reasoning quality. Experimental results demonstrate that Med-SCoT achieves advanced answer accuracy while generating structured, clinically aligned and logically coherent reasoning chains. Moreover, SCoTEval exhibits high agreement with expert judgments, validating its reliability for structured reasoning assessment. The code, data, and models are available at: <span><span>https://github.com/qiaodongxing/Med-SCoT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102659"},"PeriodicalIF":4.9,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-22DOI: 10.1016/j.compmedimag.2025.102658
Nicola Altini , Michela Prunella , Surya V. Seshan , Savino Sciascia , Antonella Barreca , Alessandro Del Gobbo , Stefan Porubsky , Hien Van Nguyen , Claudia Delprete , Berardino Prencipe , Deján Dobi , Daan P.C. van Doorn , Sjoerd A.M.E.G. Timmermans , Pieter van Paassen , Vitoantonio Bevilacqua , Jan Ulrich Becker
Automatic tissue segmentation is a necessary step for the bulk analysis of whole slide images (WSIs) from paraffin histology sections in kidney biopsies. However, existing models often fail to generalize across the main nephropathological staining methods and to capture the severe morphological distortions in arteries, arterioles, and glomeruli common in thrombotic microangiopathy (TMA) or other vasculopathies. Therefore, we developed an automatic multi-staining segmentation pipeline covering six key compartments: Artery, Arteriole, Glomerulus, Cortex, Medulla, and Capsule/Other. This framework enables downstream tasks such as counting and labeling at instance-, WSI- or biopsy-level. Biopsies (n = 158) from seven centers: Cologne, Turin, Milan, Weill-Cornell, Mainz, Maastricht, Budapest, were classified by expert nephropathologists into TMA (n = 87) or Mimickers (n = 71). Ground truth expert segmentation masks were provided for all compartments, and expert binary TMA classification labels for Glomerulus, Artery, Arteriole. The biopsies were divided into training (n = 79), validation (n = 26), and test (n = 53) subsets. We benchmarked six deep learning models for semantic segmentation (U-Net, FPN, DeepLabV3+, Mask2Former, SegFormer, SegNeXt) and five models for classification (ResNet-34, DenseNet-121, EfficientNet-v2-S, ConvNeXt-Small, Swin-v2-B). We obtained robust segmentation results across all compartments. On the test set, the best models achieved Dice coefficients of 0.903 (Cortex), 0.834 (Medulla), 0.816 (Capsule/Other), 0.922 (Glomerulus), 0.822 (Artery), and 0.553 (Arteriole). The best classification models achieved Accuracy of 0.724 and 0.841 for Glomerulus and Artery plus Arteriole compartments, respectively. Furthermore, we release NePathTK (NephroPathology Toolkit), a powerful open-source end-to-end pipeline integrated with QuPath, enabling accurate segmentation for decision support in nephropathology and large-scale analysis of kidney biopsies.
{"title":"Multistain multicompartment automatic segmentation in renal biopsies with thrombotic microangiopathies and other vasculopathies","authors":"Nicola Altini , Michela Prunella , Surya V. Seshan , Savino Sciascia , Antonella Barreca , Alessandro Del Gobbo , Stefan Porubsky , Hien Van Nguyen , Claudia Delprete , Berardino Prencipe , Deján Dobi , Daan P.C. van Doorn , Sjoerd A.M.E.G. Timmermans , Pieter van Paassen , Vitoantonio Bevilacqua , Jan Ulrich Becker","doi":"10.1016/j.compmedimag.2025.102658","DOIUrl":"10.1016/j.compmedimag.2025.102658","url":null,"abstract":"<div><div>Automatic tissue segmentation is a necessary step for the bulk analysis of whole slide images (WSIs) from paraffin histology sections in kidney biopsies. However, existing models often fail to generalize across the main nephropathological staining methods and to capture the severe morphological distortions in arteries, arterioles, and glomeruli common in thrombotic microangiopathy (TMA) or other vasculopathies. Therefore, we developed an automatic multi-staining segmentation pipeline covering six key compartments: Artery, Arteriole, Glomerulus, Cortex, Medulla, and Capsule/Other. This framework enables downstream tasks such as counting and labeling at instance-, WSI- or biopsy-level. Biopsies (n = 158) from seven centers: Cologne, Turin, Milan, Weill-Cornell, Mainz, Maastricht, Budapest, were classified by expert nephropathologists into TMA (n = 87) or Mimickers (n = 71). Ground truth expert segmentation masks were provided for all compartments, and expert binary TMA classification labels for Glomerulus, Artery, Arteriole. The biopsies were divided into training (n = 79), validation (n = 26), and test (n = 53) subsets. We benchmarked six deep learning models for semantic segmentation (U-Net, FPN, DeepLabV3+, Mask2Former, SegFormer, SegNeXt) and five models for classification (ResNet-34, DenseNet-121, EfficientNet-v2-S, ConvNeXt-Small, Swin-v2-B). We obtained robust segmentation results across all compartments. On the test set, the best models achieved Dice coefficients of 0.903 (Cortex), 0.834 (Medulla), 0.816 (Capsule/Other), 0.922 (Glomerulus), 0.822 (Artery), and 0.553 (Arteriole). The best classification models achieved Accuracy of 0.724 and 0.841 for Glomerulus and Artery plus Arteriole compartments, respectively. Furthermore, we release NePathTK (NephroPathology Toolkit), a powerful open-source end-to-end pipeline integrated with QuPath, enabling accurate segmentation for decision support in nephropathology and large-scale analysis of kidney biopsies.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102658"},"PeriodicalIF":4.9,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145410649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-21DOI: 10.1016/j.compmedimag.2025.102655
Xuan Huang , Zhuang Ai , Chongyang She , Qi Li , Qihao Wei , Sha Xu , Yaping Lu , Fanxin Zeng
Diabetic retinopathy (DR) is a leading cause of blindness worldwide, yet current diagnosis relies on labor-intensive and subjective fundus image interpretation. Here we present a convolutional neural network-transformer fusion model (DR-CTFN) that integrates ConvNeXt and Swin Transformer algorithms with a lightweight attention block (LAB) to enhance feature extraction. To address dataset imbalance, we applied standardized preprocessing and extensive image augmentation. On the Kaggle EyePACS dataset, DR-CTFN outperformed ConvNeXt and Swin Transformer in accuracy by 3.14% and 8.39%, while also achieving a superior area under the curve (AUC) by 1% and 26.08%. External validation on APTOS 2019 Blindness Detection and a clinical DR dataset yielded accuracies of 84.45% and 85.31%, with AUC values of 95.22% and 95.79%, respectively. These results demonstrate that DR-CTFN enables rapid, robust, and precise DR detection, offering a scalable approach for early diagnosis and prevention of vision loss, thereby enhancing the quality of life for DR patients.
{"title":"A CNN-Transformer fusion network for Diabetic retinopathy image classification","authors":"Xuan Huang , Zhuang Ai , Chongyang She , Qi Li , Qihao Wei , Sha Xu , Yaping Lu , Fanxin Zeng","doi":"10.1016/j.compmedimag.2025.102655","DOIUrl":"10.1016/j.compmedimag.2025.102655","url":null,"abstract":"<div><div>Diabetic retinopathy (DR) is a leading cause of blindness worldwide, yet current diagnosis relies on labor-intensive and subjective fundus image interpretation. Here we present a convolutional neural network-transformer fusion model (DR-CTFN) that integrates ConvNeXt and Swin Transformer algorithms with a lightweight attention block (LAB) to enhance feature extraction. To address dataset imbalance, we applied standardized preprocessing and extensive image augmentation. On the Kaggle EyePACS dataset, DR-CTFN outperformed ConvNeXt and Swin Transformer in accuracy by 3.14% and 8.39%, while also achieving a superior area under the curve (AUC) by 1% and 26.08%. External validation on APTOS 2019 Blindness Detection and a clinical DR dataset yielded accuracies of 84.45% and 85.31%, with AUC values of 95.22% and 95.79%, respectively. These results demonstrate that DR-CTFN enables rapid, robust, and precise DR detection, offering a scalable approach for early diagnosis and prevention of vision loss, thereby enhancing the quality of life for DR patients.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102655"},"PeriodicalIF":4.9,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145394925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-13DOI: 10.1016/j.compmedimag.2025.102656
Teng Zhou , Jax Luo , Yuping Sun , Yiheng Tan , Shun Yao , Nazim Haouchine , Scott Raymond
Accurate MRI-to-CT translation promises the integration of complementary imaging information without the need for additional imaging sessions. Given the practical challenges associated with acquiring paired MRI and CT scans, the development of robust methods capable of leveraging unpaired datasets is essential for advancing the MRI-to-CT translation. Current unpaired MRI-to-CT translation methods, which predominantly rely on cycle consistency and contrastive learning frameworks, frequently encounter challenges in accurately translating anatomical features that are highly discernible on CT but less distinguishable on MRI, such as bone structures. This limitation renders these approaches less suitable for applications in radiation therapy, where precise bone representation is essential for accurate treatment planning. To address this challenge, we propose a path- and bone-contour regularized approach for unpaired MRI-to-CT translation. In our method, MRI and CT images are projected to a shared latent space, where the MRI-to-CT mapping is modeled as a continuous flow governed by neural ordinary differential equations. The optimal mapping is obtained by minimizing the transition path length of the flow. To enhance the accuracy of translated bone structures, we introduce a trainable neural network to generate bone contours from MRI and implement mechanisms to directly and indirectly encourage the model to focus on bone contours and their adjacent regions. Evaluations conducted on three datasets demonstrate that our method outperforms existing unpaired MRI-to-CT translation approaches, achieving lower overall error rates. Moreover, in a downstream bone segmentation task, our approach exhibits superior performance in preserving the fidelity of bone structures. Our code is available at: https://github.com/kennysyp/PaBoT.
{"title":"Path and bone-contour regularized unpaired MRI-to-CT translation","authors":"Teng Zhou , Jax Luo , Yuping Sun , Yiheng Tan , Shun Yao , Nazim Haouchine , Scott Raymond","doi":"10.1016/j.compmedimag.2025.102656","DOIUrl":"10.1016/j.compmedimag.2025.102656","url":null,"abstract":"<div><div>Accurate MRI-to-CT translation promises the integration of complementary imaging information without the need for additional imaging sessions. Given the practical challenges associated with acquiring paired MRI and CT scans, the development of robust methods capable of leveraging unpaired datasets is essential for advancing the MRI-to-CT translation. Current unpaired MRI-to-CT translation methods, which predominantly rely on cycle consistency and contrastive learning frameworks, frequently encounter challenges in accurately translating anatomical features that are highly discernible on CT but less distinguishable on MRI, such as bone structures. This limitation renders these approaches less suitable for applications in radiation therapy, where precise bone representation is essential for accurate treatment planning. To address this challenge, we propose a path- and bone-contour regularized approach for unpaired MRI-to-CT translation. In our method, MRI and CT images are projected to a shared latent space, where the MRI-to-CT mapping is modeled as a continuous flow governed by neural ordinary differential equations. The optimal mapping is obtained by minimizing the transition path length of the flow. To enhance the accuracy of translated bone structures, we introduce a trainable neural network to generate bone contours from MRI and implement mechanisms to directly and indirectly encourage the model to focus on bone contours and their adjacent regions. Evaluations conducted on three datasets demonstrate that our method outperforms existing unpaired MRI-to-CT translation approaches, achieving lower overall error rates. Moreover, in a downstream bone segmentation task, our approach exhibits superior performance in preserving the fidelity of bone structures. Our code is available at: <span><span>https://github.com/kennysyp/PaBoT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102656"},"PeriodicalIF":4.9,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145290008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-10DOI: 10.1016/j.compmedimag.2025.102654
Lishuang Guo , Haonan Zhang , Chenbin Ma
Ultrasound imaging, as an economical, efficient, and non-invasive diagnostic tool, is widely used for breast lesion screening and diagnosis. However, the segmentation of lesion regions remains a significant challenge due to factors such as noise interference and the variability in image quality. To address this issue, we propose a novel deep learning model named enhanced segment anything model 2 (SAM2) for breast lesion segmentation (ESAM2-BLS). This model is an optimized version of the SAM2 architecture. ESAM2-BLS customizes and fine-tunes the pre-trained SAM2 model by introducing an adapter module, specifically designed to accommodate the unique characteristics of breast ultrasound images. The adapter module directly addresses ultrasound-specific challenges including speckle noise, low contrast boundaries, shadowing artifacts, and anisotropic resolution through targeted architectural elements such as channel attention mechanisms, specialized convolution kernels, and optimized skip connections. This optimization significantly improves segmentation accuracy, particularly for low-contrast and small lesion regions. Compared to traditional methods, ESAM2-BLS fully leverages the generalization capabilities of large models while incorporating multi-scale feature fusion and axial dilated depthwise convolution to effectively capture multi-level information from complex lesions. During the decoding process, the model enhances the identification of fine boundaries and small lesions through depthwise separable convolutions and skip connections, while maintaining a low computational cost. Visualization of the segmentation results and interpretability analysis demonstrate that ESAM2-BLS achieves an average Dice score of 0.9077 and 0.8633 in five-fold cross-validation across two datasets with over 1600 patients. These results significantly improve segmentation accuracy and robustness. This model provides an efficient, reliable, and specialized automated solution for early breast cancer screening and diagnosis.
{"title":"ESAM2-BLS: Enhanced segment anything model 2 for efficient breast lesion segmentation in ultrasound imaging","authors":"Lishuang Guo , Haonan Zhang , Chenbin Ma","doi":"10.1016/j.compmedimag.2025.102654","DOIUrl":"10.1016/j.compmedimag.2025.102654","url":null,"abstract":"<div><div>Ultrasound imaging, as an economical, efficient, and non-invasive diagnostic tool, is widely used for breast lesion screening and diagnosis. However, the segmentation of lesion regions remains a significant challenge due to factors such as noise interference and the variability in image quality. To address this issue, we propose a novel deep learning model named enhanced segment anything model 2 (SAM2) for breast lesion segmentation (ESAM2-BLS). This model is an optimized version of the SAM2 architecture. ESAM2-BLS customizes and fine-tunes the pre-trained SAM2 model by introducing an adapter module, specifically designed to accommodate the unique characteristics of breast ultrasound images. The adapter module directly addresses ultrasound-specific challenges including speckle noise, low contrast boundaries, shadowing artifacts, and anisotropic resolution through targeted architectural elements such as channel attention mechanisms, specialized convolution kernels, and optimized skip connections. This optimization significantly improves segmentation accuracy, particularly for low-contrast and small lesion regions. Compared to traditional methods, ESAM2-BLS fully leverages the generalization capabilities of large models while incorporating multi-scale feature fusion and axial dilated depthwise convolution to effectively capture multi-level information from complex lesions. During the decoding process, the model enhances the identification of fine boundaries and small lesions through depthwise separable convolutions and skip connections, while maintaining a low computational cost. Visualization of the segmentation results and interpretability analysis demonstrate that ESAM2-BLS achieves an average Dice score of 0.9077 and 0.8633 in five-fold cross-validation across two datasets with over 1600 patients. These results significantly improve segmentation accuracy and robustness. This model provides an efficient, reliable, and specialized automated solution for early breast cancer screening and diagnosis.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102654"},"PeriodicalIF":4.9,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145356747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09DOI: 10.1016/j.compmedimag.2025.102647
Pauline Shan Qing Yeoh , Khairunnisa Hasikin , Xiang Wu , Siew Li Goh , Khin Wee Lai
Automated medical imaging analysis plays a crucial role in modern healthcare, with deep learning emerging as a widely adopted solution. However, traditional supervised learning methods often struggle to achieve optimal performance due to increasing challenges such as data scarcity and variability. In response, generative artificial intelligence has gained significant attention, particularly Variational Autoencoders (VAEs), which have been extensively utilized to address various challenges in medical imaging. This review analyzed 118 articles published in the Web of Science database between 2018 and 2024. Bibliometric analysis was conducted to map research trends, while a curated compilation of datasets and evaluation metrics were extracted to underscore the importance of standardization in deep learning workflows. VAEs have been applied across multiple healthcare applications, including anomaly detection, segmentation, classification, synthesis, registration, harmonization, and clustering. Findings suggest that VAE-based models are increasingly applied in medical imaging, with Magnetic Resonance Imaging emerging as the dominant modality and image synthesis as a primary application. The growing interest in this field highlights the potential of VAEs to enhance medical imaging analysis by overcoming existing limitations in data-driven healthcare solutions. This review serves as a valuable resource for researchers looking to integrate VAE models into healthcare applications, offering an overview of current advancements.
自动化医学成像分析在现代医疗保健中发挥着至关重要的作用,深度学习正在成为一种广泛采用的解决方案。然而,由于数据稀缺性和可变性等挑战的增加,传统的监督学习方法往往难以达到最佳性能。因此,生成式人工智能已经获得了极大的关注,特别是变分自编码器(VAEs),它已被广泛用于解决医学成像中的各种挑战。该综述分析了2018年至2024年间发表在Web of Science数据库中的118篇文章。进行文献计量分析以绘制研究趋势,同时提取了精心整理的数据集和评估指标,以强调标准化在深度学习工作流程中的重要性。VAEs已应用于多个医疗保健应用程序,包括异常检测、分割、分类、合成、注册、协调和聚类。研究结果表明,基于vae的模型越来越多地应用于医学成像,磁共振成像正在成为主导模式,图像合成是主要应用。对这一领域日益增长的兴趣凸显了VAEs的潜力,通过克服数据驱动的医疗保健解决方案中的现有限制来增强医学成像分析。这篇综述为希望将VAE模型集成到医疗保健应用程序中的研究人员提供了宝贵的资源,概述了当前的进展。
{"title":"Trends and applications of variational autoencoders in medical imaging analysis","authors":"Pauline Shan Qing Yeoh , Khairunnisa Hasikin , Xiang Wu , Siew Li Goh , Khin Wee Lai","doi":"10.1016/j.compmedimag.2025.102647","DOIUrl":"10.1016/j.compmedimag.2025.102647","url":null,"abstract":"<div><div>Automated medical imaging analysis plays a crucial role in modern healthcare, with deep learning emerging as a widely adopted solution. However, traditional supervised learning methods often struggle to achieve optimal performance due to increasing challenges such as data scarcity and variability. In response, generative artificial intelligence has gained significant attention, particularly Variational Autoencoders (VAEs), which have been extensively utilized to address various challenges in medical imaging. This review analyzed 118 articles published in the Web of Science database between 2018 and 2024. Bibliometric analysis was conducted to map research trends, while a curated compilation of datasets and evaluation metrics were extracted to underscore the importance of standardization in deep learning workflows. VAEs have been applied across multiple healthcare applications, including anomaly detection, segmentation, classification, synthesis, registration, harmonization, and clustering. Findings suggest that VAE-based models are increasingly applied in medical imaging, with Magnetic Resonance Imaging emerging as the dominant modality and image synthesis as a primary application. The growing interest in this field highlights the potential of VAEs to enhance medical imaging analysis by overcoming existing limitations in data-driven healthcare solutions. This review serves as a valuable resource for researchers looking to integrate VAE models into healthcare applications, offering an overview of current advancements.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"126 ","pages":"Article 102647"},"PeriodicalIF":4.9,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145290006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.compmedimag.2025.102648
Jiashun Wang , Hao Tang , Zhan Wu , Yikun Zhang , Yan Xi , Yang Chen , Chunfeng Yang , Yixin Zhou , Hui Tang
Medical imaging of the knee joint under physiological weight bearing is crucial for diagnosing and analyzing knee lesions. Existing modalities have limitations: Standing Cone-Beam Computed Tomography (Standing-CBCT) provides high-resolution 3D data but with long acquisition time and only a single static view, while Dynamic X-ray Imaging (DXR) captures continuous motion but lacks 3D structural information. These limitations motivate the need for dynamic 3D knee generation through 2D/3D registration of Standing-CBCT and DXR. Anatomically, although the femur, patella, and tibia–fibula undergo rigid motion, the joint as a whole exhibits non-rigid behavior. Consequently, existing rigid or non-rigid 2D/3D registration methods fail to fully address this scenario. We propose Twin-ViMReg, a twin-stream 2D/3D registration framework for multiple correlated objects in the knee joint. It extends conventional 2D/3D registration paradigm by establishing a pair of twined sub-tasks. By introducing a Multi-Objective Spatial Transformation (MOST) module, it models inter-object correlations and enhances registration robustness. The Vision Mamba-based encoder also strengthens the representation capacity of the method. We used 1,500 simulated data pairs from 10 patients for training and 56 real data pairs from 3 patients for testing. Quantitative evaluation shows that the mean TRE reached 3.36 mm, the RSR was 8.93% higher than the SOTA methods. With an average computation time of 1.22 s per X-ray image, Twin-ViMReg enables efficient 2D/3D knee joint registration within seconds, making it a practical and promising solution.
{"title":"Twin-ViMReg: DXR driven synthetic dynamic Standing-CBCTs through Twin Vision Mamba-based 2D/3D registration","authors":"Jiashun Wang , Hao Tang , Zhan Wu , Yikun Zhang , Yan Xi , Yang Chen , Chunfeng Yang , Yixin Zhou , Hui Tang","doi":"10.1016/j.compmedimag.2025.102648","DOIUrl":"10.1016/j.compmedimag.2025.102648","url":null,"abstract":"<div><div>Medical imaging of the knee joint under physiological weight bearing is crucial for diagnosing and analyzing knee lesions. Existing modalities have limitations: Standing Cone-Beam Computed Tomography (Standing-CBCT) provides high-resolution 3D data but with long acquisition time and only a single static view, while Dynamic X-ray Imaging (DXR) captures continuous motion but lacks 3D structural information. These limitations motivate the need for dynamic 3D knee generation through 2D/3D registration of Standing-CBCT and DXR. Anatomically, although the femur, patella, and tibia–fibula undergo rigid motion, the joint as a whole exhibits non-rigid behavior. Consequently, existing rigid or non-rigid 2D/3D registration methods fail to fully address this scenario. We propose Twin-ViMReg, a twin-stream 2D/3D registration framework for multiple correlated objects in the knee joint. It extends conventional 2D/3D registration paradigm by establishing a pair of twined sub-tasks. By introducing a Multi-Objective Spatial Transformation (MOST) module, it models inter-object correlations and enhances registration robustness. The Vision Mamba-based encoder also strengthens the representation capacity of the method. We used 1,500 simulated data pairs from 10 patients for training and 56 real data pairs from 3 patients for testing. Quantitative evaluation shows that the mean TRE reached 3.36 mm, the RSR was 8.93% higher than the SOTA methods. With an average computation time of 1.22 s per X-ray image, Twin-ViMReg enables efficient 2D/3D knee joint registration within seconds, making it a practical and promising solution.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102648"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.compmedimag.2025.102650
Yinuo Wang , Cai Meng , Zhe Xu
Accurate segmentation of vascular structures in volumetric medical images is critical for disease diagnosis and surgical planning. While deep neural networks have shown remarkable effectiveness, existing methods often rely on separate models tailored to specific modalities and anatomical regions, resulting in redundant parameters and limited generalization. Recent universal models address broader segmentation tasks but struggle with the unique challenges of vascular structures. To overcome these limitations, we first present VasBench, a new comprehensive vascular segmentation benchmark comprising nine sub-datasets spanning diverse modalities and anatomical regions. Building on this foundation, we introduce VasCab, a novel prompt-guided universal model for volumetric vascular segmentation, designed to “collect vascular specimens in one cabinet”. Specifically, VasCab is equipped with learnable domain and topology prompts to capture shared and unique vascular characteristics across diverse data domains, complemented by morphology perceptual loss to address complex morphological variations. Experimental results demonstrate that VasCab surpasses individual models and state-of-the-art medical foundation models across all test datasets, showcasing exceptional cross-domain integration and precise modeling of vascular morphological variations. Moreover, VasCab exhibits robust performance in downstream tasks, underscoring its versatility and potential for unified vascular analysis. This study marks a significant step toward universal vascular segmentation, offering a promising solution for unified vascular analysis across heterogeneous datasets. Code and dataset are available at https://github.com/mileswyn/VasCab.
{"title":"Collect vascular specimens in one cabinet: A hierarchical prompt-guided universal model for 3D vascular segmentation","authors":"Yinuo Wang , Cai Meng , Zhe Xu","doi":"10.1016/j.compmedimag.2025.102650","DOIUrl":"10.1016/j.compmedimag.2025.102650","url":null,"abstract":"<div><div>Accurate segmentation of vascular structures in volumetric medical images is critical for disease diagnosis and surgical planning. While deep neural networks have shown remarkable effectiveness, existing methods often rely on separate models tailored to specific modalities and anatomical regions, resulting in redundant parameters and limited generalization. Recent universal models address broader segmentation tasks but struggle with the unique challenges of vascular structures. To overcome these limitations, we first present <strong>VasBench</strong>, a new comprehensive vascular segmentation benchmark comprising nine sub-datasets spanning diverse modalities and anatomical regions. Building on this foundation, we introduce <strong>VasCab</strong>, a novel prompt-guided universal model for volumetric vascular segmentation, designed to “collect vascular specimens in one cabinet”. Specifically, VasCab is equipped with learnable domain and topology prompts to capture shared and unique vascular characteristics across diverse data domains, complemented by morphology perceptual loss to address complex morphological variations. Experimental results demonstrate that VasCab surpasses individual models and state-of-the-art medical foundation models across all test datasets, showcasing exceptional cross-domain integration and precise modeling of vascular morphological variations. Moreover, VasCab exhibits robust performance in downstream tasks, underscoring its versatility and potential for unified vascular analysis. This study marks a significant step toward universal vascular segmentation, offering a promising solution for unified vascular analysis across heterogeneous datasets. Code and dataset are available at <span><span>https://github.com/mileswyn/VasCab</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102650"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145201977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.compmedimag.2025.102651
Jonghun Kim , Inye Na , Jiwon Chung , Ha-Na Song , Kyungseo Kim , Seongvin Ju , Mi-Yeon Eun , Woo-Keun Seo , Hyunjin Park
Intracranial vessel segmentation is essential for managing brain disorders, facilitating early detection and precise intervention of stroke and aneurysm. Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) is a commonly used vascular imaging technique for segmenting brain vessels. Traditional rule-based MRA segmentation methods were efficient, but suffered from instability and poor performance. Deep learning models, including diffusion models, have recently gained attention in medical image segmentation. However, they require ground truth for training, which is labor-intensive and time-consuming to obtain. We propose a novel segmentation method that combines the strengths of rule-based and diffusion models to improve segmentation without relying on explicit labels. Our model adopts a Frangi filter to help with vessel detection and modifies the diffusion models to exclude memory-intensive attention modules to improve efficiency. Our condition network concatenates the feature maps to further enhance the segmentation process. Quantitative and qualitative evaluations on two datasets demonstrate that our approach not only maintains the integrity of the vascular regions but also substantially reduces noise, offering a robust solution for segmenting intracranial vessels. Our results suggest a basis for improved patient care in disorders involving brain vessels. Our code is available at github.com/jongdory/Vessel-Diffusion.
{"title":"Enhancing intracranial vessel segmentation using diffusion models without manual annotation for 3D Time-of-Flight Magnetic Resonance Angiography","authors":"Jonghun Kim , Inye Na , Jiwon Chung , Ha-Na Song , Kyungseo Kim , Seongvin Ju , Mi-Yeon Eun , Woo-Keun Seo , Hyunjin Park","doi":"10.1016/j.compmedimag.2025.102651","DOIUrl":"10.1016/j.compmedimag.2025.102651","url":null,"abstract":"<div><div>Intracranial vessel segmentation is essential for managing brain disorders, facilitating early detection and precise intervention of stroke and aneurysm. Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) is a commonly used vascular imaging technique for segmenting brain vessels. Traditional rule-based MRA segmentation methods were efficient, but suffered from instability and poor performance. Deep learning models, including diffusion models, have recently gained attention in medical image segmentation. However, they require ground truth for training, which is labor-intensive and time-consuming to obtain. We propose a novel segmentation method that combines the strengths of rule-based and diffusion models to improve segmentation without relying on explicit labels. Our model adopts a Frangi filter to help with vessel detection and modifies the diffusion models to exclude memory-intensive attention modules to improve efficiency. Our condition network concatenates the feature maps to further enhance the segmentation process. Quantitative and qualitative evaluations on two datasets demonstrate that our approach not only maintains the integrity of the vascular regions but also substantially reduces noise, offering a robust solution for segmenting intracranial vessels. Our results suggest a basis for improved patient care in disorders involving brain vessels. Our code is available at <span><span>github.com/jongdory/Vessel-Diffusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102651"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145259815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}