Pub Date : 2025-10-01DOI: 10.1016/j.compmedimag.2025.102653
Chao Li , Zhifeng Qin , Zhenfei Tang , Yidan Wang , Bo Zhang , Jinwei Tian , Zhao Wang
Assessing coronary artery calcification (CAC) is crucial in evaluating the progression of atherosclerosis and planning percutaneous coronary intervention (PCI). Intravascular Optical Coherence Tomography (OCT) is a commonly used imaging tool for evaluating CAC at micrometer-scale level and in three-dimensions for optimizing PCI. While existing deep learning methods have proven effective in OCT image analysis, they are hindered by the lack of large-scale, high-quality labels to train deep neural networks that can reach human level performance in practice. In this work, we propose an annotation-efficient approach for segmenting CAC in intravascular OCT images, leveraging self-supervised learning and consistency regularization. We employ a transformer encoder paired with a simple linear projection layer for self-supervised pre-training on unlabeled OCT data. Subsequently, a transformer-based segmentation model is fine-tuned on sparsely annotated OCT pullbacks with a contrast loss using a combination of unlabeled and labeled data. We collected 2,549,073 unlabeled OCT images from 7,108 OCT pullbacks for pre-training, and 1,106,347 sparsely annotated OCT images from 3,025 OCT pullbacks for model training and testing. The proposed approach consistently outperformed existing sparsely supervised methods on both internal and external datasets. In addition, extensive comparisons under full, partial, and sparse annotation schemes substantiated its high annotation efficiency. With 80% reduction in image labeling efforts, our method has the potential to expedite the development of deep learning models for processing large-scale medical image data.
{"title":"Coronary artery calcification segmentation with sparse annotations in intravascular OCT: Leveraging self-supervised learning and consistency regularization","authors":"Chao Li , Zhifeng Qin , Zhenfei Tang , Yidan Wang , Bo Zhang , Jinwei Tian , Zhao Wang","doi":"10.1016/j.compmedimag.2025.102653","DOIUrl":"10.1016/j.compmedimag.2025.102653","url":null,"abstract":"<div><div>Assessing coronary artery calcification (CAC) is crucial in evaluating the progression of atherosclerosis and planning percutaneous coronary intervention (PCI). Intravascular Optical Coherence Tomography (OCT) is a commonly used imaging tool for evaluating CAC at micrometer-scale level and in three-dimensions for optimizing PCI. While existing deep learning methods have proven effective in OCT image analysis, they are hindered by the lack of large-scale, high-quality labels to train deep neural networks that can reach human level performance in practice. In this work, we propose an annotation-efficient approach for segmenting CAC in intravascular OCT images, leveraging self-supervised learning and consistency regularization. We employ a transformer encoder paired with a simple linear projection layer for self-supervised pre-training on unlabeled OCT data. Subsequently, a transformer-based segmentation model is fine-tuned on sparsely annotated OCT pullbacks with a contrast loss using a combination of unlabeled and labeled data. We collected 2,549,073 unlabeled OCT images from 7,108 OCT pullbacks for pre-training, and 1,106,347 sparsely annotated OCT images from 3,025 OCT pullbacks for model training and testing. The proposed approach consistently outperformed existing sparsely supervised methods on both internal and external datasets. In addition, extensive comparisons under full, partial, and sparse annotation schemes substantiated its high annotation efficiency. With 80% reduction in image labeling efforts, our method has the potential to expedite the development of deep learning models for processing large-scale medical image data.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102653"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.compmedimag.2025.102649
Hao Xie , Zixun Huang , Yushen Zuo , Yakun Ju , Frank H.F. Leung , N.F. Law , Kin-Man Lam , Yong-Ping Zheng , Sai Ho Ling
Spine segmentation, based on ultrasound volume projection imaging (VPI), plays a vital role for intelligent scoliosis diagnosis in clinical applications. However, this task faces several significant challenges. Firstly, the global contextual knowledge of spines may not be well-learned if we neglect the high spatial correlation of different bone features. Secondly, the spine bones contain rich structural knowledge regarding their shapes and positions, which deserves to be encoded into the segmentation process. To address these challenges, we propose a novel scale-adaptive structure-aware network (SA2Net) for effective spine segmentation. First, we propose a scale-adaptive complementary strategy to learn the cross-dimensional long-distance correlation features for spinal images. Second, motivated by the consistency between multi-head self-attention in Transformers and semantic level affinity, we propose structure-affinity transformation to transform semantic features with class-specific affinity and combine it with a Transformer decoder for structure-aware reasoning. In addition, we adopt a feature mixing loss aggregation method to enhance model training. This method improves the robustness and accuracy of the segmentation process. The experimental results demonstrate that our SA2Net achieves superior segmentation performance compared to other state-of-the-art methods. Moreover, the adaptability of SA2Net to various backbones enhances its potential as a promising tool for advanced scoliosis diagnosis using intelligent spinal image analysis.
{"title":"SA2Net: Scale-adaptive structure-affinity transformation for spine segmentation from ultrasound volume projection imaging","authors":"Hao Xie , Zixun Huang , Yushen Zuo , Yakun Ju , Frank H.F. Leung , N.F. Law , Kin-Man Lam , Yong-Ping Zheng , Sai Ho Ling","doi":"10.1016/j.compmedimag.2025.102649","DOIUrl":"10.1016/j.compmedimag.2025.102649","url":null,"abstract":"<div><div>Spine segmentation, based on ultrasound volume projection imaging (VPI), plays a vital role for intelligent scoliosis diagnosis in clinical applications. However, this task faces several significant challenges. Firstly, the global contextual knowledge of spines may not be well-learned if we neglect the high spatial correlation of different bone features. Secondly, the spine bones contain rich structural knowledge regarding their shapes and positions, which deserves to be encoded into the segmentation process. To address these challenges, we propose a novel scale-adaptive structure-aware network (SA<sup>2</sup>Net) for effective spine segmentation. First, we propose a scale-adaptive complementary strategy to learn the cross-dimensional long-distance correlation features for spinal images. Second, motivated by the consistency between multi-head self-attention in Transformers and semantic level affinity, we propose structure-affinity transformation to transform semantic features with class-specific affinity and combine it with a Transformer decoder for structure-aware reasoning. In addition, we adopt a feature mixing loss aggregation method to enhance model training. This method improves the robustness and accuracy of the segmentation process. The experimental results demonstrate that our SA<sup>2</sup>Net achieves superior segmentation performance compared to other state-of-the-art methods. Moreover, the adaptability of SA<sup>2</sup>Net to various backbones enhances its potential as a promising tool for advanced scoliosis diagnosis using intelligent spinal image analysis.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102649"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.compmedimag.2025.102652
Zhuofan Xie , Zishan Lin , Enlong Sun , Fengyi Ding , Jie Qi , Shen Zhao
Automated vertebra analysis (AVA), encompassing vertebra detection and segmentation, plays a critical role in computer-aided diagnosis, surgical planning, and postoperative evaluation in spine-related clinical workflows. Despite notable progress, AVA continues to face key challenges, including variations in the field of view (FOV), complex vertebral morphology, limited availability of high-quality annotated data, and performance degradation under domain shifts. Over the past decade, numerous studies have employed deep learning (DL) to tackle these issues, introducing advanced network architectures and innovative learning paradigms. However, the rapid evolution of these methods has not been comprehensively captured by existing surveys, resulting in a knowledge gap regarding the current state of the field. To address this, this paper presents an up-to-date review that systematically summarizes recent advances. The review begins by consolidating publicly available datasets and evaluation metrics to support standardized benchmarking. Recent DL-based AVA approaches are then analyzed from two methodological perspectives: network architecture improvement and learning strategies design. Finally, an examination of persistent technical barriers and emerging clinical needs that are shaping future research directions is provided. These include multimodal learning, domain generalization, and the integration of foundation models. As the most current survey in the field, this review provides a comprehensive and structured synthesis aimed at guiding future research toward the development of robust, generalizable, and clinically deployable AVA systems in the era of intelligent medical imaging.
{"title":"Deep learning for automatic vertebra analysis: A methodological survey of recent advances","authors":"Zhuofan Xie , Zishan Lin , Enlong Sun , Fengyi Ding , Jie Qi , Shen Zhao","doi":"10.1016/j.compmedimag.2025.102652","DOIUrl":"10.1016/j.compmedimag.2025.102652","url":null,"abstract":"<div><div>Automated vertebra analysis (AVA), encompassing vertebra detection and segmentation, plays a critical role in computer-aided diagnosis, surgical planning, and postoperative evaluation in spine-related clinical workflows. Despite notable progress, AVA continues to face key challenges, including variations in the field of view (FOV), complex vertebral morphology, limited availability of high-quality annotated data, and performance degradation under domain shifts. Over the past decade, numerous studies have employed deep learning (DL) to tackle these issues, introducing advanced network architectures and innovative learning paradigms. However, the rapid evolution of these methods has not been comprehensively captured by existing surveys, resulting in a knowledge gap regarding the current state of the field. To address this, this paper presents an up-to-date review that systematically summarizes recent advances. The review begins by consolidating publicly available datasets and evaluation metrics to support standardized benchmarking. Recent DL-based AVA approaches are then analyzed from two methodological perspectives: network architecture improvement and learning strategies design. Finally, an examination of persistent technical barriers and emerging clinical needs that are shaping future research directions is provided. These include multimodal learning, domain generalization, and the integration of foundation models. As the most current survey in the field, this review provides a comprehensive and structured synthesis aimed at guiding future research toward the development of robust, generalizable, and clinically deployable AVA systems in the era of intelligent medical imaging.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102652"},"PeriodicalIF":4.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-15DOI: 10.1016/j.compmedimag.2025.102644
Jun Wang , Lixing Zhu , Abhir Bhalerao , Yulan He
Radiology report generation (RRG) methods often lack sufficient medical knowledge to produce clinically accurate reports. A scene graph provides comprehensive information for describing objects within an image. However, automatically generated radiology scene graphs (RSG) may contain noise annotations and highly overlapping regions, posing challenges in utilizing RSG to enhance RRG. To this end, we propose Scene Graph aided RRG (SGRRG), a framework that leverages an automatically generated RSG and copes with noisy supervision problems in the RSG with a transformer-based module, effectively distilling medical knowledge in an end-to-end manner. SGRRG is composed of a dedicated scene graph encoder responsible for translating the radiography into a RSG, and a scene graph-aided decoder that takes advantage of both patch-level and region-level visual information and mitigates the noisy annotation problem in the RSG. The incorporation of both patch-level and region-level features, alongside the integration of the essential RSG construction modules, enhances our framework’s flexibility and robustness, enabling it to readily exploit prior advanced RRG techniques. A fine-grained, sentence-level attention method is designed to better distill the RSG information. Additionally, we introduce two proxy tasks to enhance the model’s ability to produce clinically accurate reports. Extensive experiments demonstrate that SGRRG outperforms previous state-of-the-art methods in report generation and can better capture abnormal findings. Code is available at https://github.com/Markin-Wang/SGRRG.
{"title":"SGRRG: Leveraging radiology scene graphs for improved and abnormality-aware radiology report generation","authors":"Jun Wang , Lixing Zhu , Abhir Bhalerao , Yulan He","doi":"10.1016/j.compmedimag.2025.102644","DOIUrl":"10.1016/j.compmedimag.2025.102644","url":null,"abstract":"<div><div>Radiology report generation (RRG) methods often lack sufficient medical knowledge to produce clinically accurate reports. A scene graph provides comprehensive information for describing objects within an image. However, automatically generated radiology scene graphs (RSG) may contain noise annotations and highly overlapping regions, posing challenges in utilizing RSG to enhance RRG. To this end, we propose Scene Graph aided RRG (SGRRG), a framework that leverages an automatically generated RSG and copes with noisy supervision problems in the RSG with a transformer-based module, effectively distilling medical knowledge in an end-to-end manner. SGRRG is composed of a dedicated scene graph encoder responsible for translating the radiography into a RSG, and a scene graph-aided decoder that takes advantage of both patch-level and region-level visual information and mitigates the noisy annotation problem in the RSG. The incorporation of both patch-level and region-level features, alongside the integration of the essential RSG construction modules, enhances our framework’s flexibility and robustness, enabling it to readily exploit prior advanced RRG techniques. A fine-grained, sentence-level attention method is designed to better distill the RSG information. Additionally, we introduce two proxy tasks to enhance the model’s ability to produce clinically accurate reports. Extensive experiments demonstrate that SGRRG outperforms previous state-of-the-art methods in report generation and can better capture abnormal findings. Code is available at <span><span>https://github.com/Markin-Wang/SGRRG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102644"},"PeriodicalIF":4.9,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-15DOI: 10.1016/j.compmedimag.2025.102636
Zhen Liu , Bangkang Fu , Jiahui Mao , Junjie He , Jiangyue Xiang , Hongjin Li , Yunsong Peng , Bangguo Li , Rongpin Wang
This paper proposes MH-STR, a novel end-to-end framework for predicting the three-month risk of Acute Coronary Syndrome (ACS) from Coronary CT Angiography (CCTA) images. The model combines hybrid attention mechanisms with convolutional networks to capture subtle and irregular lesion patterns that are difficult to detect visually. A stage-wise transfer learning strategy helps distill general features and transfer vascular-specific knowledge. To reconcile feature scale mismatches in the dual-branch architecture, we introduce a wavelet-based multi-scale fusion module for effective integration across scales. Experiments show that MH-STR achieves an AUC of 0.834, an F1 score of 0.82, and a precision of 0.92, outperforming existing methods and highlighting its potential for improving ACS risk prediction.
{"title":"Unveiling hidden risks: A Holistically-Driven Weak Supervision framework for ultra-short-term ACS prediction using CCTA","authors":"Zhen Liu , Bangkang Fu , Jiahui Mao , Junjie He , Jiangyue Xiang , Hongjin Li , Yunsong Peng , Bangguo Li , Rongpin Wang","doi":"10.1016/j.compmedimag.2025.102636","DOIUrl":"10.1016/j.compmedimag.2025.102636","url":null,"abstract":"<div><div>This paper proposes MH-STR, a novel end-to-end framework for predicting the three-month risk of Acute Coronary Syndrome (ACS) from Coronary CT Angiography (CCTA) images. The model combines hybrid attention mechanisms with convolutional networks to capture subtle and irregular lesion patterns that are difficult to detect visually. A stage-wise transfer learning strategy helps distill general features and transfer vascular-specific knowledge. To reconcile feature scale mismatches in the dual-branch architecture, we introduce a wavelet-based multi-scale fusion module for effective integration across scales. Experiments show that MH-STR achieves an AUC of 0.834, an F1 score of 0.82, and a precision of 0.92, outperforming existing methods and highlighting its potential for improving ACS risk prediction.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102636"},"PeriodicalIF":4.9,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145088019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-13DOI: 10.1016/j.compmedimag.2025.102646
Yu Miao , Sijie Song , Lin Zhao , Jun Zhao , Yingsen Wang , Ran Gong , Yan Qiang , Hua Zhang , Juanjuan Zhao
Precise identification of Kirsten Rat Sarcoma (KRAS) gene mutation status is critical for both qualitative analysis of colorectal cancer and formulation of personalized therapeutic regimens. In this paper, we propose a Segmentation-based Hierarchical feature Interaction Attention Model (SHIAM) that synergizes multi-task learning with hierarchical feature integration, aiming to achieve accurate prediction of the KRAS gene mutation status. Specifically, we integrate segmentation and classification tasks, sharing feature representations between them. To fully focus on the lesion areas at different levels and their potential associations, we design a multi-level synergistic attention block that enables adaptive fusion of lesion characteristics of varying granularity with their contextual associations. To transcend the constraints of conventional methodologies in modeling long-range relationships, we design a global collaborative interaction attention module, an efficient improved long-range perception Transformer. As the core component of module, the long-range perception block provides robust support for mining feature integrity with its excellent perception ability. Furthermore, we introduce a hybrid feature engineering strategy that integrates hand-crafted features encoded as statistical information entropy with automatically learned deep representations, thereby establishing a complementary feature space. Our SHIAM has been rigorously trained and verified on the colorectal cancer dataset provided by Shanxi Cancer Hospital. The results show that it achieves an accuracy of 89.42% and an AUC value of 95.89% in KRAS gene mutation status prediction, with comprehensive performance superior to all current non-invasive assays. In clinical practice, our model possesses the capability to enable computer-aided diagnosis, effectively assisting physicians in formulating suitable personalized treatment plans for patients.
{"title":"A segmentation-based hierarchical feature interaction attention model for gene mutation status identification in colorectal cancer","authors":"Yu Miao , Sijie Song , Lin Zhao , Jun Zhao , Yingsen Wang , Ran Gong , Yan Qiang , Hua Zhang , Juanjuan Zhao","doi":"10.1016/j.compmedimag.2025.102646","DOIUrl":"10.1016/j.compmedimag.2025.102646","url":null,"abstract":"<div><div>Precise identification of Kirsten Rat Sarcoma (KRAS) gene mutation status is critical for both qualitative analysis of colorectal cancer and formulation of personalized therapeutic regimens. In this paper, we propose a Segmentation-based Hierarchical feature Interaction Attention Model (SHIAM) that synergizes multi-task learning with hierarchical feature integration, aiming to achieve accurate prediction of the KRAS gene mutation status. Specifically, we integrate segmentation and classification tasks, sharing feature representations between them. To fully focus on the lesion areas at different levels and their potential associations, we design a multi-level synergistic attention block that enables adaptive fusion of lesion characteristics of varying granularity with their contextual associations. To transcend the constraints of conventional methodologies in modeling long-range relationships, we design a global collaborative interaction attention module, an efficient improved long-range perception Transformer. As the core component of module, the long-range perception block provides robust support for mining feature integrity with its excellent perception ability. Furthermore, we introduce a hybrid feature engineering strategy that integrates hand-crafted features encoded as statistical information entropy with automatically learned deep representations, thereby establishing a complementary feature space. Our SHIAM has been rigorously trained and verified on the colorectal cancer dataset provided by Shanxi Cancer Hospital. The results show that it achieves an accuracy of 89.42% and an AUC value of 95.89% in KRAS gene mutation status prediction, with comprehensive performance superior to all current non-invasive assays. In clinical practice, our model possesses the capability to enable computer-aided diagnosis, effectively assisting physicians in formulating suitable personalized treatment plans for patients.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102646"},"PeriodicalIF":4.9,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-13DOI: 10.1016/j.compmedimag.2025.102643
Samah Khawaled , Onur Afacan , Simon K. Warfield , Moti Freiman
Functional Magnetic Resonance Imaging (fMRI) is vital in neuroscience, enabling investigations into brain disorders, treatment monitoring, and brain function mapping. However, head motion during fMRI scans, occurring between shots of slice acquisition, can result in distortion, biased analyses, and increased costs due to the need for scan repetitions. Therefore, retrospective slice-level motion correction through slice-to-volume registration (SVR) is crucial. Previous studies have utilized deep learning (DL) based models to address the SVR task; however, they overlooked the uncertainty stemming from the input stack of slices and did not assign weighting or scoring to each slice. Treating all slices equally ignores the variability in their relevance, leading to suboptimal predictions. In this work, we introduce an end-to-end SVR model for aligning 2D fMRI slices with a 3D reference volume, incorporating a self-attention mechanism to enhance robustness against input data variations and uncertainties. Our SVR model utilizes independent slice and volume encoders and a self-attention module to assign pixel-wise scores for each slice. We used the publicly available Healthy Brain Network (HBN) dataset. We split the volumes into training (64%), validation (16%), and test (20%) sets. To conduct the simulated motion study, we synthesized rigid transformations across a wide range of parameters and applied them to the reference volumes. Slices were then sampled according to the acquisition protocol to generate 2,000, 500, and 200 3D volume–2D slice pairs for the training, validation, and test sets, respectively. Our experimental results demonstrate that our model achieves competitive performance in terms of alignment accuracy compared to state-of-the-art deep learning-based methods (Euclidean distance of 0.93 [mm] vs. 1.86 [mm], a paired t-test with a -value of ). Furthermore, our approach exhibits faster registration speed compared to conventional iterative methods (0.096 s vs. 1.17 s). Our end-to-end SVR model facilitates real-time head motion tracking during fMRI acquisition, ensuring reliability and robustness against uncertainties in the inputs.
功能磁共振成像(fMRI)在神经科学中是至关重要的,它使研究大脑疾病、治疗监测和脑功能绘图成为可能。然而,在fMRI扫描期间,头部运动发生在切片采集之间,由于需要重复扫描,可能导致失真、分析偏差和成本增加。因此,通过切片-体积配准(SVR)进行回顾性切片级运动校正至关重要。以前的研究利用基于深度学习(DL)的模型来解决SVR任务;然而,他们忽略了来自切片输入堆栈的不确定性,并且没有为每个切片分配权重或评分。平等地对待所有切片忽略了它们相关性的可变性,导致次优预测。在这项工作中,我们引入了一个端到端的SVR模型,用于将2D fMRI切片与3D参考体积对准,该模型结合了一个自注意机制,以增强对输入数据变化和不确定性的鲁棒性。我们的SVR模型利用独立的切片和音量编码器以及自关注模块为每个切片分配像素级分数。我们使用了公开可用的健康大脑网络(HBN)数据集。我们将这些数据集分成训练集(64%)、验证集(16%)和测试集(20%)。为了进行模拟运动研究,我们在广泛的参数范围内合成了刚性变换,并将它们应用于参考体积。然后根据采集协议对切片进行采样,分别为训练集、验证集和测试集生成2,000、500和200个3D体- 2d切片对。我们的实验结果表明,与最先进的基于深度学习的方法相比,我们的模型在对齐精度方面取得了具有竞争力的性能(欧几里得距离为0.93 [mm] vs. 1.86 [mm],配对t检验,p值为p
{"title":"A self-attention model for robust rigid slice-to-volume registration of functional MRI","authors":"Samah Khawaled , Onur Afacan , Simon K. Warfield , Moti Freiman","doi":"10.1016/j.compmedimag.2025.102643","DOIUrl":"10.1016/j.compmedimag.2025.102643","url":null,"abstract":"<div><div>Functional Magnetic Resonance Imaging (fMRI) is vital in neuroscience, enabling investigations into brain disorders, treatment monitoring, and brain function mapping. However, head motion during fMRI scans, occurring between shots of slice acquisition, can result in distortion, biased analyses, and increased costs due to the need for scan repetitions. Therefore, retrospective slice-level motion correction through slice-to-volume registration (SVR) is crucial. Previous studies have utilized deep learning (DL) based models to address the SVR task; however, they overlooked the uncertainty stemming from the input stack of slices and did not assign weighting or scoring to each slice. Treating all slices equally ignores the variability in their relevance, leading to suboptimal predictions. In this work, we introduce an end-to-end SVR model for aligning 2D fMRI slices with a 3D reference volume, incorporating a self-attention mechanism to enhance robustness against input data variations and uncertainties. Our SVR model utilizes independent slice and volume encoders and a self-attention module to assign pixel-wise scores for each slice. We used the publicly available Healthy Brain Network (HBN) dataset. We split the volumes into training (64%), validation (16%), and test (20%) sets. To conduct the simulated motion study, we synthesized rigid transformations across a wide range of parameters and applied them to the reference volumes. Slices were then sampled according to the acquisition protocol to generate 2,000, 500, and 200 3D volume–2D slice pairs for the training, validation, and test sets, respectively. Our experimental results demonstrate that our model achieves competitive performance in terms of alignment accuracy compared to state-of-the-art deep learning-based methods (Euclidean distance of 0.93 [mm] vs. 1.86 [mm], a paired t-test with a <span><math><mi>p</mi></math></span>-value of <span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>03</mn></mrow></math></span>). Furthermore, our approach exhibits faster registration speed compared to conventional iterative methods (0.096 s vs. 1.17 s). Our end-to-end SVR model facilitates real-time head motion tracking during fMRI acquisition, ensuring reliability and robustness against uncertainties in the inputs.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102643"},"PeriodicalIF":4.9,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-12DOI: 10.1016/j.compmedimag.2025.102645
Keyi Han , Anqi Xiao , Jie Tian , Zhenhua Hu
Objective
Blood vessel analysis is essential in various clinical fields. Detailed vascular imaging enables clinicians to assess abnormalities and make timely, effective interventions. Near-infrared-II (NIR-II, 1000–1700 nm) fluorescence imaging offers superior resolution, sensitivity, and deeper tissue visualization, making it highly promising for vascular imaging. However, deep vessels exhibit relatively low contrast, making differentiation challenging, and accurate vessel segmentation remains a difficult task.
Methods
We propose CALFNet, a context-aware local feature network based on the Mamba module, which can segment more vascular details in low-contrast regions. CALFNet overall follows a UNet-like architectures, with a ResNet-based encoder for extracting local features and a Mamba-based context-aware module in the latent space for the awareness of the global context. By incorporating the global vessel contextual information, the network can enhance segmentation performance in locally low-contrast areas, capturing finer vessel structures more effectively. Furthermore, a feature-enhance module between the encoder and decoder is designed to preserve critical historical local features from the encoder and use them to further refine the vascular details in the decoder's feature representations.
Results
We conducted experiments on two types of clinical datasets, including an NIR-II fluorescent vascular imaging dataset and retinal vessel datasets captured under visible light. The results show that CALFNet outperforms the comparison methods, demonstrating superior robustness and achieving more accurate vessel segmentation, particularly in low-contrast regions.
Conclusion and Significance
CALFNet is an effective vessel segmentation network showing better performance in accurately segmenting vessels within low-contrast regions. It can enhance the capability of NIR-II fluorescence imaging for vascular analysis, providing valuable support for clinical diagnosis and medical intervention.
目的血管分析在临床各个领域都是必不可少的。详细的血管成像使临床医生能够评估异常并及时有效地进行干预。近红外- ii (NIR-II, 1000-1700 nm)荧光成像提供卓越的分辨率,灵敏度和更深层次的组织可视化,使其在血管成像方面非常有前途。然而,深层血管的对比度相对较低,使得分化具有挑战性,并且准确的血管分割仍然是一项艰巨的任务。方法提出基于Mamba模块的上下文感知局部特征网络CALFNet,该网络可以在低对比度区域分割出更多的血管细节。CALFNet总体上遵循类似unet的架构,使用基于resnet的编码器来提取本地特征,在潜在空间中使用基于mamba的上下文感知模块来感知全局上下文。通过整合全局船舶上下文信息,网络可以增强局部低对比度区域的分割性能,更有效地捕获更精细的船舶结构。此外,在编码器和解码器之间设计了一个特征增强模块,用于保留编码器的关键历史局部特征,并使用它们进一步细化解码器特征表示中的血管细节。结果我们对两种类型的临床数据集进行了实验,包括NIR-II荧光血管成像数据集和可见光下捕获的视网膜血管数据集。结果表明,CALFNet优于对比方法,表现出优越的鲁棒性,实现了更准确的血管分割,特别是在低对比度区域。结论与意义alfnet是一种有效的血管分割网络,在低对比度区域内具有较好的血管准确分割效果。可增强NIR-II荧光成像血管分析能力,为临床诊断和医学干预提供有价值的支持。
{"title":"Mamba-based context-aware local feature network for vessel detail enhancement","authors":"Keyi Han , Anqi Xiao , Jie Tian , Zhenhua Hu","doi":"10.1016/j.compmedimag.2025.102645","DOIUrl":"10.1016/j.compmedimag.2025.102645","url":null,"abstract":"<div><h3>Objective</h3><div>Blood vessel analysis is essential in various clinical fields. Detailed vascular imaging enables clinicians to assess abnormalities and make timely, effective interventions. Near-infrared-II (NIR-II, 1000–1700 nm) fluorescence imaging offers superior resolution, sensitivity, and deeper tissue visualization, making it highly promising for vascular imaging. However, deep vessels exhibit relatively low contrast, making differentiation challenging, and accurate vessel segmentation remains a difficult task.</div></div><div><h3>Methods</h3><div>We propose CALFNet, a context-aware local feature network based on the Mamba module, which can segment more vascular details in low-contrast regions. CALFNet overall follows a UNet-like architectures, with a ResNet-based encoder for extracting local features and a Mamba-based context-aware module in the latent space for the awareness of the global context. By incorporating the global vessel contextual information, the network can enhance segmentation performance in locally low-contrast areas, capturing finer vessel structures more effectively. Furthermore, a feature-enhance module between the encoder and decoder is designed to preserve critical historical local features from the encoder and use them to further refine the vascular details in the decoder's feature representations.</div></div><div><h3>Results</h3><div>We conducted experiments on two types of clinical datasets, including an NIR-II fluorescent vascular imaging dataset and retinal vessel datasets captured under visible light. The results show that CALFNet outperforms the comparison methods, demonstrating superior robustness and achieving more accurate vessel segmentation, particularly in low-contrast regions.</div></div><div><h3>Conclusion and Significance</h3><div>CALFNet is an effective vessel segmentation network showing better performance in accurately segmenting vessels within low-contrast regions. It can enhance the capability of NIR-II fluorescence imaging for vascular analysis, providing valuable support for clinical diagnosis and medical intervention.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102645"},"PeriodicalIF":4.9,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145118689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-06DOI: 10.1016/j.compmedimag.2025.102640
Min Tan , Yushun Tao , Boyun Zheng , Gaosheng Xie , Zeyang Xia , Jing Xiong
Endoscopic depth estimation is crucial for video understanding, robotic navigation, and 3D reconstruction in minimally invasive surgeries. However, existing methods for monocular depth estimation often struggle with the challenging conditions of endoscopic imagery, such as complex illumination, narrow luminal spaces, and low-contrast surfaces, resulting in inaccurate depth predictions. To address these challenges, we propose the Structure-Content Integrated Diffusion Estimation (SCIDE) for accurate and fast endoscopic depth estimation. Specifically, we introduce the Structure Content Extractor (SC-Extractor), a module specifically designed to extract structure and content priors to guide the depth estimation process in endoscopic environments. Additionally, we propose the Fast Optimized Diffusion Sampler (FODS) to meet the real-time needs in endoscopic surgery scenarios. FODS is a general sampling mechanism that optimizes selection of time steps in diffusion models. Our method (SCIDE) shows remarkable performance with an RMSE value of 0.0875 and a reduction of 74.2% in inference time when using FODS. These results demonstrate that our SCIDE framework achieves state-of-the-art accuracy of endoscopic depth estimation, and making real-time application feasible in endoscopic surgeries. https://misrobotx.github.io/scide/
{"title":"Accurate and fast monocular endoscopic depth estimation of structure-content integrated diffusion","authors":"Min Tan , Yushun Tao , Boyun Zheng , Gaosheng Xie , Zeyang Xia , Jing Xiong","doi":"10.1016/j.compmedimag.2025.102640","DOIUrl":"10.1016/j.compmedimag.2025.102640","url":null,"abstract":"<div><div>Endoscopic depth estimation is crucial for video understanding, robotic navigation, and 3D reconstruction in minimally invasive surgeries. However, existing methods for monocular depth estimation often struggle with the challenging conditions of endoscopic imagery, such as complex illumination, narrow luminal spaces, and low-contrast surfaces, resulting in inaccurate depth predictions. To address these challenges, we propose the Structure-Content Integrated Diffusion Estimation (SCIDE) for accurate and fast endoscopic depth estimation. Specifically, we introduce the Structure Content Extractor (SC-Extractor), a module specifically designed to extract structure and content priors to guide the depth estimation process in endoscopic environments. Additionally, we propose the Fast Optimized Diffusion Sampler (FODS) to meet the real-time needs in endoscopic surgery scenarios. FODS is a general sampling mechanism that optimizes selection of time steps in diffusion models. Our method (SCIDE) shows remarkable performance with an RMSE value of 0.0875 and a reduction of 74.2% in inference time when using FODS. These results demonstrate that our SCIDE framework achieves state-of-the-art accuracy of endoscopic depth estimation, and making real-time application feasible in endoscopic surgeries. <span><span>https://misrobotx.github.io/scide/</span><svg><path></path></svg></span></div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102640"},"PeriodicalIF":4.9,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145026858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01DOI: 10.1016/j.compmedimag.2025.102642
Xiang Li , Faming Fang , Liyan Ma , Tieyong Zeng , Guixu Zhang , Ming Xu
An increasing number of publicly available datasets have facilitated the exploration of building universal medical segmentation models. Existing approaches address partially labeled problem of each dataset by harmonizing labels across datasets and independently focusing on the labeled foreground regions. However, significant challenges persist, particularly in the form of cross-site domain shifts and the limited utilization of partially labeled datasets. In this paper, we propose a GAMOS (Generic Abdominal Multi-Organ Segmentation) framework. Specifically, GAMOS integrates a self-guidance strategy to adopt diffusion models for partial labeling issue, while employing a self-distillation mechanism to effectively leverage unlabeled data. A sparse semantic memory is introduced to mitigate domain shifts by ensuring consistent representations in the latent space. To further enhance performance, we design a sparse similarity loss to align multi-view memory representations and enhance the discriminability and compactness of the memory vectors. Extensive experiments on real-world medical datasets demonstrate the superiority and generalization ability of GAMOS. It achieves a mean Dice Similarity Coefficient (DSC) of 91.33% and a mean 95th percentile Hausdorff Distance (HD95) of 1.83 on labeled foreground regions. For unlabeled foreground regions, GAMOS obtains a mean DSC of 86.88% and a mean HD95 of 3.85, outperforming existing state-of-the-art methods.
{"title":"Towards Generic Abdominal Multi-Organ Segmentation with multiple partially labeled datasets","authors":"Xiang Li , Faming Fang , Liyan Ma , Tieyong Zeng , Guixu Zhang , Ming Xu","doi":"10.1016/j.compmedimag.2025.102642","DOIUrl":"10.1016/j.compmedimag.2025.102642","url":null,"abstract":"<div><div>An increasing number of publicly available datasets have facilitated the exploration of building universal medical segmentation models. Existing approaches address partially labeled problem of each dataset by harmonizing labels across datasets and independently focusing on the labeled foreground regions. However, significant challenges persist, particularly in the form of cross-site domain shifts and the limited utilization of partially labeled datasets. In this paper, we propose a GAMOS (<strong>G</strong>eneric <strong>A</strong>bdominal <strong>M</strong>ulti-<strong>O</strong>rgan <strong>S</strong>egmentation) framework. Specifically, GAMOS integrates a self-guidance strategy to adopt diffusion models for partial labeling issue, while employing a self-distillation mechanism to effectively leverage unlabeled data. A sparse semantic memory is introduced to mitigate domain shifts by ensuring consistent representations in the latent space. To further enhance performance, we design a sparse similarity loss to align multi-view memory representations and enhance the discriminability and compactness of the memory vectors. Extensive experiments on real-world medical datasets demonstrate the superiority and generalization ability of GAMOS. It achieves a mean Dice Similarity Coefficient (DSC) of 91.33% and a mean 95th percentile Hausdorff Distance (HD95) of 1.83 on labeled foreground regions. For unlabeled foreground regions, GAMOS obtains a mean DSC of 86.88% and a mean HD95 of 3.85, outperforming existing state-of-the-art methods.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102642"},"PeriodicalIF":4.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144932397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}