Major depressive disorder (MDD) or depression is a chronic mental illness that significantly impacts individuals' well-being and is often diagnosed at advanced stages, increasing the risk of suicide. Current diagnostic practices, which rely heavily on subjective assessments and patient self-reports, are often hindered by challenges such as under-reporting and the failure to detect early, subtle symptoms. Early detection of MDD is crucial and requires monitoring vital signs in daily living conditions. The electroencephalogram (EEG) is a valuable tool for monitoring brain activity, providing critical information on MDD and its underlying neurological mechanisms. While traditional EEG systems typically involve multiple channels for recording, making them impractical for home-based monitoring, wearable sensors can effectively capture single-channel EEG data. However, generating meaningful features from these data poses challenges due to the need for specialized domain knowledge and significant computational power, which can hinder real-time processing. To address these issues, our study focuses on developing a deep learning model for the binary classification of MDD using single-channel EEG data. We focused on specific channels from various brain regions such as central, frontal, occipital, temporal, and parietal. Our study found that the channels Fp1, F8 and Cz achieved an impressive accuracy of 90% when analyzed using a Convolutional Neural Network (CNN) with leave-one-subject-out cross-validation on a public dataset. Our study highlights the potential of utilizing single-channel EEG data for reliable MDD diagnosis, providing a less intrusive and more convenient wearable solution for mental health assessment.
{"title":"Simplifying Depression Diagnosis: Single-Channel EEG and Deep Learning Approaches.","authors":"Shruthi Narayanan Vaniya, Ahsan Habib, Maia Angelova, Chandan Karmakar","doi":"10.1109/JBHI.2025.3631326","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3631326","url":null,"abstract":"<p><p>Major depressive disorder (MDD) or depression is a chronic mental illness that significantly impacts individuals' well-being and is often diagnosed at advanced stages, increasing the risk of suicide. Current diagnostic practices, which rely heavily on subjective assessments and patient self-reports, are often hindered by challenges such as under-reporting and the failure to detect early, subtle symptoms. Early detection of MDD is crucial and requires monitoring vital signs in daily living conditions. The electroencephalogram (EEG) is a valuable tool for monitoring brain activity, providing critical information on MDD and its underlying neurological mechanisms. While traditional EEG systems typically involve multiple channels for recording, making them impractical for home-based monitoring, wearable sensors can effectively capture single-channel EEG data. However, generating meaningful features from these data poses challenges due to the need for specialized domain knowledge and significant computational power, which can hinder real-time processing. To address these issues, our study focuses on developing a deep learning model for the binary classification of MDD using single-channel EEG data. We focused on specific channels from various brain regions such as central, frontal, occipital, temporal, and parietal. Our study found that the channels Fp1, F8 and Cz achieved an impressive accuracy of 90% when analyzed using a Convolutional Neural Network (CNN) with leave-one-subject-out cross-validation on a public dataset. Our study highlights the potential of utilizing single-channel EEG data for reliable MDD diagnosis, providing a less intrusive and more convenient wearable solution for mental health assessment.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146105311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02DOI: 10.1109/JBHI.2026.3659853
Chao-Chia Lin, Shanq-Jang Ruan, Yu-Jen Wang
Accurate segmentation of medical images, particularly for anatomical structures with irregular shapes and low contrast such as the esophagus, remains a significant challenge. To address this issue, we propose MEM-UNet, a robust 3D Mamba-based UNet framework enhanced by mathematical morphology. Our approach adapts the State Space Model (SSM) in Mamba to support three-dimensional CT volumes, establishing an effective 3D perception backbone for the UNet architecture. In addition, we incorporate Morphology-Aware Spatial-Channel Attention (MASCA) blocks into the skip connections, where Morphology-Enhanced Spatial Convolution (MESC) augments spatial representations while Squeeze-and-Excitation (SE) highlights channel- wise features. This integration effectively leverages the shape-awareness provided by morphological operations, thus improving boundary precision. To further refine segmentation, we introduce a Morphology-Enhanced Decision (MED) layer that sharpens contour boundaries and performs voxel-level classification with high precision. Extensive experiments on SegTHOR and BTCV datasets demonstrate that MEM-UNet surpasses state-of-the-art models, achieving Dice Similarity Coefficient (DSC) scores of 87.42% and 74.86% for multi-organ segmentation, and 78.94% and 67.70% for esophagus segmentation, respectively. Ablation studies confirm the effectiveness of the proposed components and highlight the benefits of integrating mathematical morphology into our pipeline. The implementation is available at https://gitfront.io/r/cheee123/DDTJhrf3LRMd/MEM-UNet/.
{"title":"MEM-UNet: Morphology-Enhanced 3D Mamba UNet for Esophagus Segmentation.","authors":"Chao-Chia Lin, Shanq-Jang Ruan, Yu-Jen Wang","doi":"10.1109/JBHI.2026.3659853","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3659853","url":null,"abstract":"<p><p>Accurate segmentation of medical images, particularly for anatomical structures with irregular shapes and low contrast such as the esophagus, remains a significant challenge. To address this issue, we propose MEM-UNet, a robust 3D Mamba-based UNet framework enhanced by mathematical morphology. Our approach adapts the State Space Model (SSM) in Mamba to support three-dimensional CT volumes, establishing an effective 3D perception backbone for the UNet architecture. In addition, we incorporate Morphology-Aware Spatial-Channel Attention (MASCA) blocks into the skip connections, where Morphology-Enhanced Spatial Convolution (MESC) augments spatial representations while Squeeze-and-Excitation (SE) highlights channel- wise features. This integration effectively leverages the shape-awareness provided by morphological operations, thus improving boundary precision. To further refine segmentation, we introduce a Morphology-Enhanced Decision (MED) layer that sharpens contour boundaries and performs voxel-level classification with high precision. Extensive experiments on SegTHOR and BTCV datasets demonstrate that MEM-UNet surpasses state-of-the-art models, achieving Dice Similarity Coefficient (DSC) scores of 87.42% and 74.86% for multi-organ segmentation, and 78.94% and 67.70% for esophagus segmentation, respectively. Ablation studies confirm the effectiveness of the proposed components and highlight the benefits of integrating mathematical morphology into our pipeline. The implementation is available at https://gitfront.io/r/cheee123/DDTJhrf3LRMd/MEM-UNet/.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146105286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2023.3340956
Abdul Qayyum, Imran Razzak, Moona Mazher, Tariq Khan, Weiping Ding, Steven Niederer
The availability of large, high-quality annotated datasets in the medical domain poses a substantial challenge in segmentation tasks. To mitigate the reliance on annotated training data, self-supervised pre-training strategies have emerged, particularly employing contrastive learning methods on dense pixel-level representations. In this work, we proposed to capitalize on intrinsic anatomical similarities within medical image data and develop a semantic segmentation framework through a self-supervised fusion network, where the availability of annotated volumes is limited. In a unified training phase, we combine segmentation loss with contrastive loss, enhancing the distinction between significant anatomical regions that adhere to the available annotations. To further improve the segmentation performance, we introduce an efficient parallel transformer module that leverages Multiview multiscale feature fusion and depth-wise features. The proposed transformer architecture, based on multiple encoders, is trained in a self-supervised manner using contrastive loss. Initially, the transformer is trained using an unlabeled dataset. We then fine-tune one encoder using data from the first stage and another encoder using a small set of annotated segmentation masks. These encoder features are subsequently concatenated for the purpose of brain tumor segmentation. The multiencoder-based transformer model yields significantly better outcomes across three medical image segmentation tasks. We validated our proposed solution by fusing images across diverse medical image segmentation challenge datasets, demonstrating its efficacy by outperforming state-of-the-art methodologies.
{"title":"Two-Stage Self-Supervised Contrastive Learning Aided Transformer for Real-Time Medical Image Segmentation.","authors":"Abdul Qayyum, Imran Razzak, Moona Mazher, Tariq Khan, Weiping Ding, Steven Niederer","doi":"10.1109/JBHI.2023.3340956","DOIUrl":"10.1109/JBHI.2023.3340956","url":null,"abstract":"<p><p>The availability of large, high-quality annotated datasets in the medical domain poses a substantial challenge in segmentation tasks. To mitigate the reliance on annotated training data, self-supervised pre-training strategies have emerged, particularly employing contrastive learning methods on dense pixel-level representations. In this work, we proposed to capitalize on intrinsic anatomical similarities within medical image data and develop a semantic segmentation framework through a self-supervised fusion network, where the availability of annotated volumes is limited. In a unified training phase, we combine segmentation loss with contrastive loss, enhancing the distinction between significant anatomical regions that adhere to the available annotations. To further improve the segmentation performance, we introduce an efficient parallel transformer module that leverages Multiview multiscale feature fusion and depth-wise features. The proposed transformer architecture, based on multiple encoders, is trained in a self-supervised manner using contrastive loss. Initially, the transformer is trained using an unlabeled dataset. We then fine-tune one encoder using data from the first stage and another encoder using a small set of annotated segmentation masks. These encoder features are subsequently concatenated for the purpose of brain tumor segmentation. The multiencoder-based transformer model yields significantly better outcomes across three medical image segmentation tasks. We validated our proposed solution by fusing images across diverse medical image segmentation challenge datasets, demonstrating its efficacy by outperforming state-of-the-art methodologies.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1039-1048"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138800439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3592444
Hong-Jie Dai, Han-Hsiang Wu
Cancer registration is a vital source of information for government-driven cancer prevention and control policies. However, cancer registry abstraction is a complex and labor-intensive process, requiring the extraction of structured data from large volumes of unstructured clinical reports. To address these challenges, we propose a hierarchical temporal attention network leveraging attention mechanisms at the word, sentence, and document levels, while incorporating temporal and report type information to capture nuanced relationships within patients' longitudinal data. To ensure robust evaluation, a stratified sampling algorithm was developed to balance the training, validation, and test datasets across 23 coding tasks, mitigating potential biases. The proposed method achieved an average F1-score of 0.82, outperforming existing approaches by prioritizing task-relevant words, sentences, and reports through its attention mechanism. An ablation study confirmed the critical contributions of the proposed components. Furthermore, a prototype visualization tool was developed to present interpretability, providing cancer registrars with insights into the decision-making process by visualizing attention at multiple levels of granularity. Overall, the proposed methods combined with the interpretability-focused visualization tool, represent a significant step toward automating cancer registry abstraction from unstructured clinical text in longitudinal settings.
{"title":"Hierarchical Temporal Attention Networks for Cancer Registry Abstraction: Leveraging Longitudinal Clinical Data With Interpretability.","authors":"Hong-Jie Dai, Han-Hsiang Wu","doi":"10.1109/JBHI.2025.3592444","DOIUrl":"10.1109/JBHI.2025.3592444","url":null,"abstract":"<p><p>Cancer registration is a vital source of information for government-driven cancer prevention and control policies. However, cancer registry abstraction is a complex and labor-intensive process, requiring the extraction of structured data from large volumes of unstructured clinical reports. To address these challenges, we propose a hierarchical temporal attention network leveraging attention mechanisms at the word, sentence, and document levels, while incorporating temporal and report type information to capture nuanced relationships within patients' longitudinal data. To ensure robust evaluation, a stratified sampling algorithm was developed to balance the training, validation, and test datasets across 23 coding tasks, mitigating potential biases. The proposed method achieved an average F<sub>1</sub>-score of 0.82, outperforming existing approaches by prioritizing task-relevant words, sentences, and reports through its attention mechanism. An ablation study confirmed the critical contributions of the proposed components. Furthermore, a prototype visualization tool was developed to present interpretability, providing cancer registrars with insights into the decision-making process by visualizing attention at multiple levels of granularity. Overall, the proposed methods combined with the interpretability-focused visualization tool, represent a significant step toward automating cancer registry abstraction from unstructured clinical text in longitudinal settings.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1652-1665"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144707314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3602272
Kai Chen, Mu Nie, Jean-Louis Coatrieux, Yang Chen, Shipeng Xie
Medical imaging has developed from an auxiliary means of clinical examination into a significant method and intuitive basis for clinical diagnosis of diseases, providing all-around and full-cycle health protection for the people. The Internet of Medical Things (IoMT) allows medical equipment, intelligent terminals, medical infrastructure, and other elements of medical production to be interconnected, eliminating information silos and data fragmentation. Medical images disseminated in IoMT contain a wide diversity of sensitive patient information, which means protecting the patient's personal information is vital. In this work, an Adversarial-improved reversible steganography network (Airs-Net) for computed tomography (CT) images in the IoMT is presented. Specifically, the Airs-Net adopting the prediction-embedding strategy mainly consists of an image restoration network, an embedded pixel location network, and a discriminator. The image restoration network is effective in restoring the pixel prediction error of the restoration set in integer and non-integer scaled images of arbitrary size when information is concealed. The embedded information location network can automatically select pixel locations for information embedding based on the interpolated image features of the degraded image. The restored image, embedding location map, and embedding information are fed into the embedder for information embedding, and the subsequent secret-carrying image is continuously optimized for the quality of the information-embedded image by the discriminator. Quantitative results show that Airs-Net outperforms state-of-the-art methods in both PSNR and SSIM. Further, the qualitative and quantitative results and analyses under specific clinical application scenarios and in coping with multiple types of medical image information hiding demonstrate the excellent generalization performance and practical application capability of the Airs-Net.
{"title":"Airs-Net: Adversarial-Improved Reversible Steganography Network for CT Images in the Internet of Medical Things and Telemedicine.","authors":"Kai Chen, Mu Nie, Jean-Louis Coatrieux, Yang Chen, Shipeng Xie","doi":"10.1109/JBHI.2025.3602272","DOIUrl":"10.1109/JBHI.2025.3602272","url":null,"abstract":"<p><p>Medical imaging has developed from an auxiliary means of clinical examination into a significant method and intuitive basis for clinical diagnosis of diseases, providing all-around and full-cycle health protection for the people. The Internet of Medical Things (IoMT) allows medical equipment, intelligent terminals, medical infrastructure, and other elements of medical production to be interconnected, eliminating information silos and data fragmentation. Medical images disseminated in IoMT contain a wide diversity of sensitive patient information, which means protecting the patient's personal information is vital. In this work, an Adversarial-improved reversible steganography network (Airs-Net) for computed tomography (CT) images in the IoMT is presented. Specifically, the Airs-Net adopting the prediction-embedding strategy mainly consists of an image restoration network, an embedded pixel location network, and a discriminator. The image restoration network is effective in restoring the pixel prediction error of the restoration set in integer and non-integer scaled images of arbitrary size when information is concealed. The embedded information location network can automatically select pixel locations for information embedding based on the interpolated image features of the degraded image. The restored image, embedding location map, and embedding information are fed into the embedder for information embedding, and the subsequent secret-carrying image is continuously optimized for the quality of the information-embedded image by the discriminator. Quantitative results show that Airs-Net outperforms state-of-the-art methods in both PSNR and SSIM. Further, the qualitative and quantitative results and analyses under specific clinical application scenarios and in coping with multiple types of medical image information hiding demonstrate the excellent generalization performance and practical application capability of the Airs-Net.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1479-1491"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144951930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the impact of chronic mental disorders increases, multimodal sentiment analysis (MSA) has emerged to improve diagnosis and treatment. In this paper, our approach leverages disentangled representation learning to address modality heterogeneity with self-supervised learning as a guidance. The self-supervised learning is proposed to generate pseudo unimodal labels and guide modality-specific representation learning, preventing the acquisition of meaningless features. Additionally, we also propose a text-centric fusion to effectively mitigate the impacts of noise and redundant information and fuse the acquired disentangled representations into a comprehensive multimodal representation. We evaluate our model on three publicly available benchmark datasets for multimodal sentiment analysis and a privately collected dataset focusing on schizophrenia counseling. The experimental results demonstrate state-of-the-art performance across various metrics on the benchmark datasets, surpassing related works. Furthermore, our learning algorithm shows promising performance in real-world applications, outperforming our previous work and achieving significant progress in schizophrenia assessment.
{"title":"Self-Supervised Guided Modality Disentangled Representation Learning for Multimodal Sentiment Analysis and Schizophrenia Assessment.","authors":"Hsin-Yang Chang, An-Sheng Liu, Yi-Ting Lin, Chen-Chung Liu, Lue-En Lee, Feng-Yi Chen, Shu-Hui Hung, Li-Chen Fu","doi":"10.1109/JBHI.2025.3604933","DOIUrl":"10.1109/JBHI.2025.3604933","url":null,"abstract":"<p><p>As the impact of chronic mental disorders increases, multimodal sentiment analysis (MSA) has emerged to improve diagnosis and treatment. In this paper, our approach leverages disentangled representation learning to address modality heterogeneity with self-supervised learning as a guidance. The self-supervised learning is proposed to generate pseudo unimodal labels and guide modality-specific representation learning, preventing the acquisition of meaningless features. Additionally, we also propose a text-centric fusion to effectively mitigate the impacts of noise and redundant information and fuse the acquired disentangled representations into a comprehensive multimodal representation. We evaluate our model on three publicly available benchmark datasets for multimodal sentiment analysis and a privately collected dataset focusing on schizophrenia counseling. The experimental results demonstrate state-of-the-art performance across various metrics on the benchmark datasets, surpassing related works. Furthermore, our learning algorithm shows promising performance in real-world applications, outperforming our previous work and achieving significant progress in schizophrenia assessment.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1630-1641"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cervical spondylosis, a complex and prevalent condition, demands precise and efficient diagnostic techniques for accurate assessment. While MRI offers detailed visualization of cervical spine anatomy, manual interpretation remains labor-intensive and prone to error. To address this, we developed an innovative AI-assisted Expert-based Diagnosis System1 that automates both segmentation and diagnosis of cervical spondylosis using MRI. Leveraging multi-center datasets of cervical MRI images from patients with cervical spondylosis, our system features a pathology-guided segmentation model capable of accurately segmenting key cervical anatomical structures. The segmentation is followed by an expert-based diagnostic framework that automates the calculation of critical clinical indicators. Our segmentation model achieved an impressive average Dice coefficient exceeding 0.90 across four cervical spinal anatomies and demonstrated enhanced accuracy in herniation areas. Diagnostic evaluation further showcased the system's precision, with the lowest mean average errors (MAE) for the C2-C7 Cobb angle and the Maximum Spinal Cord Compression (MSCC) coefficient. In addition, our method delivered high accuracy, precision, recall, and F1 scores in herniation localization, K-line status assessment, T2 hyperintensity detection, and Kang grading. Comparative analysis and external validation demonstrate that our system outperforms existing methods, establishing a new benchmark for segmentation and diagnostic tasks for cervical spondylosis.
{"title":"Pathology-Guided AI System for Accurate Segmentation and Diagnosis of Cervical Spondylosis.","authors":"Qi Zhang, Xiuyuan Chen, Ziyi He, Lianming Wu, Kun Wang, Jianqi Sun, Hongxing Shen","doi":"10.1109/JBHI.2025.3598469","DOIUrl":"10.1109/JBHI.2025.3598469","url":null,"abstract":"<p><p>Cervical spondylosis, a complex and prevalent condition, demands precise and efficient diagnostic techniques for accurate assessment. While MRI offers detailed visualization of cervical spine anatomy, manual interpretation remains labor-intensive and prone to error. To address this, we developed an innovative AI-assisted Expert-based Diagnosis System<sup>1</sup> that automates both segmentation and diagnosis of cervical spondylosis using MRI. Leveraging multi-center datasets of cervical MRI images from patients with cervical spondylosis, our system features a pathology-guided segmentation model capable of accurately segmenting key cervical anatomical structures. The segmentation is followed by an expert-based diagnostic framework that automates the calculation of critical clinical indicators. Our segmentation model achieved an impressive average Dice coefficient exceeding 0.90 across four cervical spinal anatomies and demonstrated enhanced accuracy in herniation areas. Diagnostic evaluation further showcased the system's precision, with the lowest mean average errors (MAE) for the C2-C7 Cobb angle and the Maximum Spinal Cord Compression (MSCC) coefficient. In addition, our method delivered high accuracy, precision, recall, and F1 scores in herniation localization, K-line status assessment, T2 hyperintensity detection, and Kang grading. Comparative analysis and external validation demonstrate that our system outperforms existing methods, establishing a new benchmark for segmentation and diagnostic tasks for cervical spondylosis.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1216-1229"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144845829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Segment Anything Model (SAM) has gained renown for its success in image segmentation, benefiting significantly from its pretraining on extensive datasets and its interactive prompt-based segmentation approach. Although highly effective in natural (real-world) image segmentation tasks, the SAM model encounters significant challenges in medical imaging due to the inherent differences between these two domains. To address these challenges, we propose the Spatial Prior Adapter (SPA) scheme, a parameter-efficient fine-tuning strategy that enhances SAM's adaptability to medical imaging tasks. SPA introduces two novel modules: the Spatial Prior Module (SPM), which captures localized spatial features through convolutional layers, and the Feature Communication Module (FCM), which integrates these features into SAM's image encoder via cross-attention mechanisms. Furthermore, we develop a Multiscale Feature Fusion Module (MSFFM) to enhance SAM's end-to-end segmentation capabilities by effectively aggregating multiscale contextual information. These lightweight modules require minimal computational resources while significantly boosting segmentation performance. Our approach demonstrates superior performance in both prompt-based and end-to-end segmentation scenarios through extensive experiments on publicly available medical imaging datasets. Performance highlights the potential of the proposed method to bridge the gap between foundation models and domain-specific medical imaging tasks. This advancement paves the way for more effective AI-assisted medical diagnostic systems.
SAM (Segment Anything Model)在图像分割方面取得了巨大的成功,这主要得益于其在大量数据集上的预训练和基于交互式提示的分割方法。尽管在自然(现实世界)图像分割任务中非常有效,但由于这两个领域之间的固有差异,SAM模型在医学成像中遇到了重大挑战。为了解决这些挑战,我们提出了空间先验适配器(SPA)方案,这是一种参数高效的微调策略,增强了SAM对医学成像任务的适应性。SPA引入了两个新颖的模块:空间先验模块(SPM),通过卷积层捕获局部空间特征;特征通信模块(FCM),通过交叉注意机制将这些特征集成到SAM的图像编码器中。此外,我们开发了一个多尺度特征融合模块(MSFFM),通过有效地聚合多尺度上下文信息来增强SAM的端到端分割能力。这些轻量级模块需要最少的计算资源,同时显著提高分割性能。通过在公开可用的医学成像数据集上进行大量实验,我们的方法在基于提示的和端到端分割场景中都展示了卓越的性能。性能突出了所提出的方法弥合基础模型和特定领域医学成像任务之间差距的潜力。这一进步为更有效的人工智能辅助医疗诊断系统铺平了道路。
{"title":"SPA: Leveraging the SAM With Spatial Priors Adapter for Enhanced Medical Image Segmentation.","authors":"Jihong Hu, Yinhao Li, Rahul Kumar Jain, Lanfen Lin, Yen-Wei Chen","doi":"10.1109/JBHI.2025.3526174","DOIUrl":"10.1109/JBHI.2025.3526174","url":null,"abstract":"<p><p>The Segment Anything Model (SAM) has gained renown for its success in image segmentation, benefiting significantly from its pretraining on extensive datasets and its interactive prompt-based segmentation approach. Although highly effective in natural (real-world) image segmentation tasks, the SAM model encounters significant challenges in medical imaging due to the inherent differences between these two domains. To address these challenges, we propose the Spatial Prior Adapter (SPA) scheme, a parameter-efficient fine-tuning strategy that enhances SAM's adaptability to medical imaging tasks. SPA introduces two novel modules: the Spatial Prior Module (SPM), which captures localized spatial features through convolutional layers, and the Feature Communication Module (FCM), which integrates these features into SAM's image encoder via cross-attention mechanisms. Furthermore, we develop a Multiscale Feature Fusion Module (MSFFM) to enhance SAM's end-to-end segmentation capabilities by effectively aggregating multiscale contextual information. These lightweight modules require minimal computational resources while significantly boosting segmentation performance. Our approach demonstrates superior performance in both prompt-based and end-to-end segmentation scenarios through extensive experiments on publicly available medical imaging datasets. Performance highlights the potential of the proposed method to bridge the gap between foundation models and domain-specific medical imaging tasks. This advancement paves the way for more effective AI-assisted medical diagnostic systems.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"993-1005"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3592873
Zhonghua Chen, Haitao Cao, Lauri Kettunen, Hongkai Wang
Accurate medical image segmentation is crucial for clinical diagnosis and treatment planning. However, class imbalance and vagueness of boundary in medical images make it challenging to achieve accurate and precise results. In particular, 3D multi-organ segmentation is a complex process. These challenges are further exacerbated in semi-supervised learning settings with limited labeled data. Existing methods rarely effectively incorporate boundary information to alleviate class imbalance, leading to biased predictions and suboptimal segmentation accuracy. To address these limitations, we propose DBANet, a dual-model framework integrating three key modules. The Confidence-Guided Pseudo-Label Fusion (CPF) module enhances pseudo-label reliability by selecting high-confidence logits. This improves training stability in limited annotation settings. The Boundary Distribution Awareness (BDA) module dynamically adjusts class weights based on boundary distributions, alleviating class imbalance and enhancing segmentation performance. Additionally, the Boundary Vagueness Awareness (BVA) module further refines boundary delineation by prioritizing regions with blurred boundaries. Experiments on two benchmark datasets validate the effectiveness of DBANet. On the Synapse dataset, DBANet achieves average Dice score improvements of 3.56%, 2.17%, and 5.12% under 10%, 20%, and 40% labeled data settings, respectively. Similarly, on the WORD dataset, DBANet achieves average Dice score improvements of 1.72%, 0.97%, and 0.65% under 2%, 5%, and 10% labeled data settings, respectively. These results highlight the potential of boundary-aware adaptive weighting for advancing semi-supervised medical image segmentation.
{"title":"DBANet: Dual Boundary Awareness With Confidence-Guided Pseudo Labeling for Medical Image Segmentation.","authors":"Zhonghua Chen, Haitao Cao, Lauri Kettunen, Hongkai Wang","doi":"10.1109/JBHI.2025.3592873","DOIUrl":"10.1109/JBHI.2025.3592873","url":null,"abstract":"<p><p>Accurate medical image segmentation is crucial for clinical diagnosis and treatment planning. However, class imbalance and vagueness of boundary in medical images make it challenging to achieve accurate and precise results. In particular, 3D multi-organ segmentation is a complex process. These challenges are further exacerbated in semi-supervised learning settings with limited labeled data. Existing methods rarely effectively incorporate boundary information to alleviate class imbalance, leading to biased predictions and suboptimal segmentation accuracy. To address these limitations, we propose DBANet, a dual-model framework integrating three key modules. The Confidence-Guided Pseudo-Label Fusion (CPF) module enhances pseudo-label reliability by selecting high-confidence logits. This improves training stability in limited annotation settings. The Boundary Distribution Awareness (BDA) module dynamically adjusts class weights based on boundary distributions, alleviating class imbalance and enhancing segmentation performance. Additionally, the Boundary Vagueness Awareness (BVA) module further refines boundary delineation by prioritizing regions with blurred boundaries. Experiments on two benchmark datasets validate the effectiveness of DBANet. On the Synapse dataset, DBANet achieves average Dice score improvements of 3.56%, 2.17%, and 5.12% under 10%, 20%, and 40% labeled data settings, respectively. Similarly, on the WORD dataset, DBANet achieves average Dice score improvements of 1.72%, 0.97%, and 0.65% under 2%, 5%, and 10% labeled data settings, respectively. These results highlight the potential of boundary-aware adaptive weighting for advancing semi-supervised medical image segmentation.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1203-1215"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144753167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Low reliability has consistently been a challenge in the application of deep learning models for high-risk decision-making scenarios. In medical image segmentation, multiple expert annotations can be consulted to reduce subjective bias and reach a consensus, thereby enhancing the segmentation accuracy and reliability. To develop a reliable lesion segmentation model, we propose CalDiff, a novel framework that can leverage the uncertainty from multiple annotations, capture real-world diagnostic variability and provide more informative predictions. To harness the superior generative ability of diffusion models, a dual step-wise and sequence-aware calibration mechanism is proposed on the basis of the sequential nature of diffusion models. We evaluate the calibrated model through a comprehensive quantitative and visual analysis, addressing the previously overlooked challenge of assessing uncertainty calibration and model reliability in scenarios with multiple annotations and multiple predictions. Experimental results on two lesion segmentation datasets demonstrate that CalDiff produces uncertainty maps that can reflect low confidence areas, further indicating the false predictions made by the model. By calibrating the uncertainty in the training phase, the uncertain areas produced by our model are closely correlated with areas where the model has made errors in the inference. In summary, the uncertainty captured by CalDiff can serve as a powerful indicator, which can help mitigate the risks of adopting model's outputs, allowing clinicians to prioritize reviewing areas or slices with higher uncertainty and enhancing the model's reliability and trustworthiness in clinical practice.
{"title":"CalDiff: Calibrating Uncertainty and Accessing Reliability of Diffusion Models for Trustworthy Lesion Segmentation.","authors":"Xinxin Wang, Mingrui Yang, Sercan Tosun, Kunio Nakamura, Shuo Li, Xiaojuan Li","doi":"10.1109/JBHI.2025.3624331","DOIUrl":"10.1109/JBHI.2025.3624331","url":null,"abstract":"<p><p>Low reliability has consistently been a challenge in the application of deep learning models for high-risk decision-making scenarios. In medical image segmentation, multiple expert annotations can be consulted to reduce subjective bias and reach a consensus, thereby enhancing the segmentation accuracy and reliability. To develop a reliable lesion segmentation model, we propose CalDiff, a novel framework that can leverage the uncertainty from multiple annotations, capture real-world diagnostic variability and provide more informative predictions. To harness the superior generative ability of diffusion models, a dual step-wise and sequence-aware calibration mechanism is proposed on the basis of the sequential nature of diffusion models. We evaluate the calibrated model through a comprehensive quantitative and visual analysis, addressing the previously overlooked challenge of assessing uncertainty calibration and model reliability in scenarios with multiple annotations and multiple predictions. Experimental results on two lesion segmentation datasets demonstrate that CalDiff produces uncertainty maps that can reflect low confidence areas, further indicating the false predictions made by the model. By calibrating the uncertainty in the training phase, the uncertain areas produced by our model are closely correlated with areas where the model has made errors in the inference. In summary, the uncertainty captured by CalDiff can serve as a powerful indicator, which can help mitigate the risks of adopting model's outputs, allowing clinicians to prioritize reviewing areas or slices with higher uncertainty and enhancing the model's reliability and trustworthiness in clinical practice.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1555-1567"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12682437/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145354716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}