Pub Date : 2026-03-03DOI: 10.1109/JBHI.2026.3669549
Shuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Zhaoheng Xie, Micky C Nnamdi, Wenqi Shi, J Ben Tamo, May D Wang, Yanye Lu
Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Firstly existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. Furthermore they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for enhancing medical image segmentation. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. i.e. Our MACL achieves a superior performance with more precise predictions from visualization figures and 1.72%, 7.87%, 2.49% and 1.48% Dice higher than previous best results on ACDC, MMWHS, HVSMR and CHAOS with 10% labeled data, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones.
{"title":"Multi-level Asymmetric Contrastive Learning for Medical Image Segmentation Pre-training.","authors":"Shuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Zhaoheng Xie, Micky C Nnamdi, Wenqi Shi, J Ben Tamo, May D Wang, Yanye Lu","doi":"10.1109/JBHI.2026.3669549","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669549","url":null,"abstract":"<p><p>Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Firstly existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. Furthermore they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for enhancing medical image segmentation. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. i.e. Our MACL achieves a superior performance with more precise predictions from visualization figures and 1.72%, 7.87%, 2.49% and 1.48% Dice higher than previous best results on ACDC, MMWHS, HVSMR and CHAOS with 10% labeled data, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-03DOI: 10.1109/JBHI.2026.3669741
ShuaiYu Zhang, Huihui Shao, Zhenping Xie
Academic literature retrieval is constrained by the paradox of "information overload" versus "evidence scarcity", a tension that deepens when researchers iteratively refine their queries in multi-turn conversational settings. To address this challenge, we propose Conversational Literature Personalized Re-ranking (CLPR), a personalized framework that unifies dense semantic retrieval with personalized user profiling. CLPR first performs a broad high-recall retrieval to collect candidate documents, then compresses conversational history into a concise textual profile that encodes sequential continuity, immediate focus, and long-term research background via a large language model. The generated profile serves as a pseudo-query for a neural cross-encoder to produce the final ranking. Cross-domain testing on the public LitSearch (computer science) benchmark confirms its robust generalization, yielding an NDCG@10 of 0.4793. On MedCorpus, a new multi-turn biomedical conversational retrieval benchmark constructed for this study, CLPR attains state-of-the-art performance with P@1 = 0.9497 and NDCG@10 = 0.9271, surpassing the strongest baseline by substantial margins. Ablation shows long-term background cues contribute most, and maintaining a short, up-to-date profile across turns outperforms a static one. CLPR therefore delivers accurate, personalized literature retrieval and can accelerate evidence synthesis across scientific domains.
{"title":"Improving Conversational Literature Retrieval Quality via Personalized Profile-Based Re-ranking.","authors":"ShuaiYu Zhang, Huihui Shao, Zhenping Xie","doi":"10.1109/JBHI.2026.3669741","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669741","url":null,"abstract":"<p><p>Academic literature retrieval is constrained by the paradox of \"information overload\" versus \"evidence scarcity\", a tension that deepens when researchers iteratively refine their queries in multi-turn conversational settings. To address this challenge, we propose Conversational Literature Personalized Re-ranking (CLPR), a personalized framework that unifies dense semantic retrieval with personalized user profiling. CLPR first performs a broad high-recall retrieval to collect candidate documents, then compresses conversational history into a concise textual profile that encodes sequential continuity, immediate focus, and long-term research background via a large language model. The generated profile serves as a pseudo-query for a neural cross-encoder to produce the final ranking. Cross-domain testing on the public LitSearch (computer science) benchmark confirms its robust generalization, yielding an NDCG@10 of 0.4793. On MedCorpus, a new multi-turn biomedical conversational retrieval benchmark constructed for this study, CLPR attains state-of-the-art performance with P@1 = 0.9497 and NDCG@10 = 0.9271, surpassing the strongest baseline by substantial margins. Ablation shows long-term background cues contribute most, and maintaining a short, up-to-date profile across turns outperforms a static one. CLPR therefore delivers accurate, personalized literature retrieval and can accelerate evidence synthesis across scientific domains.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inspired by the tremendous success of Large Language Models (LLMs), existing Radiology report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve final results is an urgent problem that needs to be solved. Additionally, the use of visual Transformer models also brings high computational complexity. To address these issues, this paper proposes a novel context-guided efficient radiology report generation framework. Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model. More importantly, we perform context retrieval from the training set for samples within each mini-batch during the training phase, utilizing both positively and negatively related samples to enhance feature representation and discriminative learning. Subsequently, we feed the vision tokens, context information, and prompt statements to invoke the LLM for generating high-quality medical reports. Extensive experiments on three X-ray report generation datasets (i.e., IU X-Ray, MIMIC-CXR, CheXpert Plus) fully validated the effectiveness of our proposed model. The source code is available at https://github.com/Event-AHU/Medical_ Image_Analysis.
{"title":"R2GenCSR: Mining Contextual and Residual Information for LLMs-based Radiology Report Generation.","authors":"Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang","doi":"10.1109/JBHI.2026.3669539","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669539","url":null,"abstract":"<p><p>Inspired by the tremendous success of Large Language Models (LLMs), existing Radiology report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve final results is an urgent problem that needs to be solved. Additionally, the use of visual Transformer models also brings high computational complexity. To address these issues, this paper proposes a novel context-guided efficient radiology report generation framework. Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model. More importantly, we perform context retrieval from the training set for samples within each mini-batch during the training phase, utilizing both positively and negatively related samples to enhance feature representation and discriminative learning. Subsequently, we feed the vision tokens, context information, and prompt statements to invoke the LLM for generating high-quality medical reports. Extensive experiments on three X-ray report generation datasets (i.e., IU X-Ray, MIMIC-CXR, CheXpert Plus) fully validated the effectiveness of our proposed model. The source code is available at https://github.com/Event-AHU/Medical_ Image_Analysis.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147347506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Precise delineation of the clinical target volume (CTV) and nodal CTV (CTV$_{{mathit{nd}}}$) is crucial for effective radiotherapy planning in nasopharyngeal carcinoma (NPC). Manual contouring is labor-intensive and subject to substantial inter-observer variability, particularly in regions with complex anatomy and indistinct boundaries. This study presents RT-SAM, a novel framework that adapts the Medical Segment Anything Model 2 (MedSAM-2) for automated CTV (i.e., primary CTV and CTV$_{nd}$) contouring in NPC computed tomography (CT) images. The framework synergistically integrates a generalist foundation model (MedSAM-2) with a domain-specific specialist network (2D U-Net) through three principal contributions: (1) automated generation of multi-modal prompts-comprising mask, bounding box, and point representations-derived from specialist network predictions to guide the generalist model; (2) a Visual-Prompt Fusion Attention (ViPFA) mechanism that optimizes feature-prompt interactions through bidirectional cross-modal attention; and (3) an Uncertainty-Enhanced Prediction Adjustment (UEPA) mechanism that enhances model robustness via confidence-based refinement and selective domain adaptation. Comprehensive evaluation on a multi-center cohort of 256 clinical NPC cases from Sun Yat-sen University Cancer Center and 212 public NPC cases from the SegRap2025 lymph node CTV dataset using 5- fold cross-validation demonstrates that RT-SAM achieves a mean DICE coefficient of 0.796 $pm$ 0.033 (mean $pm$ standard deviation), significantly outperforming current state-of-the-art methods. Clinical validation by eight radiation oncologists demonstrates that RT-SAM contours are clinically indistinguishable from expert delineations in blinded Turing assessments, achieve superior quality ratings in 75% of comparisons with mean scores of 2.73 for RT-SAM versus 2.66 for manual expert contours, and attain clinically acceptable ratings in over 97% of cases. These results demonstrate that RT-SAM is a clinically feasible solution for automated CTV contouring, with strong potential to standardize treatment planning and mitigate inter-observer variability in NPC radiotherapy.
{"title":"RT-SAM: Visual-Prompt Fusion and Uncertainty Enhancement for Nasopharyngeal Carcinoma Radiotherapy Target Delineation.","authors":"Hee Guan Khor, Xin Yang, Yihua Sun, Sijuan Huang, Yingni Wang, Jie Wang, Shaobin Wang, Lu Bai, Longfei Ma, Hongen Liao","doi":"10.1109/JBHI.2026.3669979","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669979","url":null,"abstract":"<p><p>Precise delineation of the clinical target volume (CTV) and nodal CTV (CTV$_{{mathit{nd}}}$) is crucial for effective radiotherapy planning in nasopharyngeal carcinoma (NPC). Manual contouring is labor-intensive and subject to substantial inter-observer variability, particularly in regions with complex anatomy and indistinct boundaries. This study presents RT-SAM, a novel framework that adapts the Medical Segment Anything Model 2 (MedSAM-2) for automated CTV (i.e., primary CTV and CTV$_{nd}$) contouring in NPC computed tomography (CT) images. The framework synergistically integrates a generalist foundation model (MedSAM-2) with a domain-specific specialist network (2D U-Net) through three principal contributions: (1) automated generation of multi-modal prompts-comprising mask, bounding box, and point representations-derived from specialist network predictions to guide the generalist model; (2) a Visual-Prompt Fusion Attention (ViPFA) mechanism that optimizes feature-prompt interactions through bidirectional cross-modal attention; and (3) an Uncertainty-Enhanced Prediction Adjustment (UEPA) mechanism that enhances model robustness via confidence-based refinement and selective domain adaptation. Comprehensive evaluation on a multi-center cohort of 256 clinical NPC cases from Sun Yat-sen University Cancer Center and 212 public NPC cases from the SegRap2025 lymph node CTV dataset using 5- fold cross-validation demonstrates that RT-SAM achieves a mean DICE coefficient of 0.796 $pm$ 0.033 (mean $pm$ standard deviation), significantly outperforming current state-of-the-art methods. Clinical validation by eight radiation oncologists demonstrates that RT-SAM contours are clinically indistinguishable from expert delineations in blinded Turing assessments, achieve superior quality ratings in 75% of comparisons with mean scores of 2.73 for RT-SAM versus 2.66 for manual expert contours, and attain clinically acceptable ratings in over 97% of cases. These results demonstrate that RT-SAM is a clinically feasible solution for automated CTV contouring, with strong potential to standardize treatment planning and mitigate inter-observer variability in NPC radiotherapy.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Falls are a common and serious cause of injury among the elderly and individuals with mobility impairments. In particular, under complex gait conditions, the early detection of imbalance is crucial for fall prevention. To address the limitations of existing methods in fall phase identification and the scarcity of real fall data, this study proposes a fall warning method based on multimodal sensor fusion and gait phase detection. By combining data from plantar pressure sensors and inertial measurement units, a gait phase detection module is introduced to achieve fine division of the gait cycle, enhancing the system's ability to detect early imbalance features. Additionally, a hybrid dataset integrating simulation data with real data is constructed, and multiple linear regression is used to accurately map simulation and real data, mitigating the issue of limited samples. Experimental results demonstrate that the proposed method achieves an accuracy of 94.8%, a recall of 92.8%, and a precision of 94.2%. It further maintains stable performance in cross-subject tests and multi-scenario evaluations, demonstrating strong reliability and generalization capability.
{"title":"Fall Warning Method Based on Multimodal Sensor Fusion and Gait Phase Detection.","authors":"Wenxuan Zhang, Qian Liang, Xiaohui Jia, Chunhu Bian, Yuxuan Guo, Tiejun Li, Jinyue Liu","doi":"10.1109/JBHI.2026.3669717","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669717","url":null,"abstract":"<p><p>Falls are a common and serious cause of injury among the elderly and individuals with mobility impairments. In particular, under complex gait conditions, the early detection of imbalance is crucial for fall prevention. To address the limitations of existing methods in fall phase identification and the scarcity of real fall data, this study proposes a fall warning method based on multimodal sensor fusion and gait phase detection. By combining data from plantar pressure sensors and inertial measurement units, a gait phase detection module is introduced to achieve fine division of the gait cycle, enhancing the system's ability to detect early imbalance features. Additionally, a hybrid dataset integrating simulation data with real data is constructed, and multiple linear regression is used to accurately map simulation and real data, mitigating the issue of limited samples. Experimental results demonstrate that the proposed method achieves an accuracy of 94.8%, a recall of 92.8%, and a precision of 94.2%. It further maintains stable performance in cross-subject tests and multi-scenario evaluations, demonstrating strong reliability and generalization capability.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Polysomnography (PSG)-based accurate sleep staging is essential to monitor sleep quality and sleep-related disorders. Despite previous attempts for improving the performance of automatic sleep staging, there are certain limitations: 1) neglecting synchronization patterns in their time-frequency (TF) domain, 2) not utilizing both local and global features within sleep epochs, and 3) neglecting correlation patterns for tracking transitions between sleep stages. To address them, we propose a novel framework based on the polynomial chirplet transform-derived characteristic response vector (PCT-CRV) for the assessment of sleep stages. In this work, we perform the time-domain PCT (TPCT) and frequency-domain PCT (FPCT) to enhance the TF representation of nonstationary PSG signals. From these PCT representations, we construct correlation matrices across their frequency bins within short-time windows to obtain characteristic response vectors (CRVs), which are the sums of eigenvectors, weighted by their corresponding eigenvalues. Subsequently, a comprehensive set of local and global features is derived from PCT-CRVs, which is subjected to various machine learning-based classifiers. Our PCT-CRV excels on three datasets, surpassing existing methods, and outperforming wavelet-based and synchrosqueezed-based CRV methods. Furthermore, to track transitions of sleep stages, we form sub-band PCT-CRVs using eigenvectors with maximum information, depending upon the physics of our problem. We hypothesize that sleep stages are characterized by specific correlation profiles, within different frequency bins. Hence, sub-band PCT-CRVs corresponding to the dominant eigenvectors, would detect transition of sleep stages across all epochs. All these results highlight the efficacy of our method in tracking sleep stage transitions and improving their classification performance.
{"title":"Revealing Sleep Dynamics with PCT-CRV: A Novel Approach for Automatic Sleep Staging and Tracking Transitions using PSG Signals.","authors":"Tehreem Fatima Zaidi, Abhishek Dixit, Deepak Joshi, Shiv Dutt Joshi","doi":"10.1109/JBHI.2026.3668939","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3668939","url":null,"abstract":"<p><p>Polysomnography (PSG)-based accurate sleep staging is essential to monitor sleep quality and sleep-related disorders. Despite previous attempts for improving the performance of automatic sleep staging, there are certain limitations: 1) neglecting synchronization patterns in their time-frequency (TF) domain, 2) not utilizing both local and global features within sleep epochs, and 3) neglecting correlation patterns for tracking transitions between sleep stages. To address them, we propose a novel framework based on the polynomial chirplet transform-derived characteristic response vector (PCT-CRV) for the assessment of sleep stages. In this work, we perform the time-domain PCT (TPCT) and frequency-domain PCT (FPCT) to enhance the TF representation of nonstationary PSG signals. From these PCT representations, we construct correlation matrices across their frequency bins within short-time windows to obtain characteristic response vectors (CRVs), which are the sums of eigenvectors, weighted by their corresponding eigenvalues. Subsequently, a comprehensive set of local and global features is derived from PCT-CRVs, which is subjected to various machine learning-based classifiers. Our PCT-CRV excels on three datasets, surpassing existing methods, and outperforming wavelet-based and synchrosqueezed-based CRV methods. Furthermore, to track transitions of sleep stages, we form sub-band PCT-CRVs using eigenvectors with maximum information, depending upon the physics of our problem. We hypothesize that sleep stages are characterized by specific correlation profiles, within different frequency bins. Hence, sub-band PCT-CRVs corresponding to the dominant eigenvectors, would detect transition of sleep stages across all epochs. All these results highlight the efficacy of our method in tracking sleep stage transitions and improving their classification performance.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147347831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-03DOI: 10.1109/JBHI.2026.3669510
Le Sun, Jie Lin, Zhiguo Qu, Yimin Yu, Jinliang Liu, Deepak Gupta, Yanchun Zhang
Medical time series analysis, particularly for electrocardiogram (ECG) and electroencephalogram (EEG) signals, is essential in modern diagnostics, supporting early detection of conditions such as arrhythmias and epileptic seizures. However, existing approaches often struggle to capture multi-scale periodic patterns and longrange dependencies while meeting real-time processing demands. The envisioned 6G networks, with their terahertz communication and integrated sensing and communication (ISAC) capabilities, will generate vast volumes of high-fidelity physiological data at the network edge. This paradigm shift intensifies the conflict between the computational complexity of advanced AI models and the limited resources of edge devices, creating a critical bottleneck for deploying sophisticated analytics in real-world healthcare scenarios. To overcome these limitations, this paper introduces a 6G-enabled hierarchical contrastive learning framework, referred to as Hierarchical Contrastive Learning for Multi-Scale Medical time series analysis (HCL-MSM), which integrates three core components: a signal-adaptive encoder based on multi-period decomposition and 2D convolution, a patient-level contrastive module enhanced with decomposable multi-scale mixing, and a 6G-edge deployment module optimized via quantization and pruning. The framework effectively models nested physiological rhythms and cross-time dependencies in medical data, while maintaining low-latency operation under resource-constrained edge environments. We evaluated HCL-MSM on multiple clinical datasets under simulated 6G settings. Our framework achieves significant gains in arrhythmia detection, seizure prediction, and neurological monitoring.We evaluated HCL-MSM on multiple clinical datasets under simulated 6G settings. Our framework achieves significant gains in arrhythmia detection (F1-score: 86.39 percent), seizure prediction (Recall: 87.72 percent), and neurological monitoring (Recall: 87.8 percent), outperforming existing state-of- the-art methods.
{"title":"A 6G-Enabled Hierarchical Contrastive Learning Framework for Multi-Scale Medical Time Series Analysis.","authors":"Le Sun, Jie Lin, Zhiguo Qu, Yimin Yu, Jinliang Liu, Deepak Gupta, Yanchun Zhang","doi":"10.1109/JBHI.2026.3669510","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669510","url":null,"abstract":"<p><p>Medical time series analysis, particularly for electrocardiogram (ECG) and electroencephalogram (EEG) signals, is essential in modern diagnostics, supporting early detection of conditions such as arrhythmias and epileptic seizures. However, existing approaches often struggle to capture multi-scale periodic patterns and longrange dependencies while meeting real-time processing demands. The envisioned 6G networks, with their terahertz communication and integrated sensing and communication (ISAC) capabilities, will generate vast volumes of high-fidelity physiological data at the network edge. This paradigm shift intensifies the conflict between the computational complexity of advanced AI models and the limited resources of edge devices, creating a critical bottleneck for deploying sophisticated analytics in real-world healthcare scenarios. To overcome these limitations, this paper introduces a 6G-enabled hierarchical contrastive learning framework, referred to as Hierarchical Contrastive Learning for Multi-Scale Medical time series analysis (HCL-MSM), which integrates three core components: a signal-adaptive encoder based on multi-period decomposition and 2D convolution, a patient-level contrastive module enhanced with decomposable multi-scale mixing, and a 6G-edge deployment module optimized via quantization and pruning. The framework effectively models nested physiological rhythms and cross-time dependencies in medical data, while maintaining low-latency operation under resource-constrained edge environments. We evaluated HCL-MSM on multiple clinical datasets under simulated 6G settings. Our framework achieves significant gains in arrhythmia detection, seizure prediction, and neurological monitoring.We evaluated HCL-MSM on multiple clinical datasets under simulated 6G settings. Our framework achieves significant gains in arrhythmia detection (F1-score: 86.39 percent), seizure prediction (Recall: 87.72 percent), and neurological monitoring (Recall: 87.8 percent), outperforming existing state-of- the-art methods.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electrocardiogram (ECG) analysis represents a promising field for deep learning applications in clinical diagnostics. However, practical use of current methods is still constrained by their heavy reliance on large amounts of labeled data, as well as limitations in processing efficiency and signal quality. To address these challenges, we present BioFPT (Biosignal Feature Pyramid Transformer), a novel self-supervised learning framework designed for ECG signal. The proposed framework incorporates a Split Mask-Join (SMJ) transformation for a Pre-training strategy, complemented by an overlapping embedding mechanism that eliminates positional encoding requirements. The efficiency of the architectural design is enhanced through a Spatial Reduction Attention (SRA) transformer, which achieves a reduction in computational complexity without performance degradation. Comprehensive evaluation of seven public ECG datasets comprising over 94,000 sub jects demonstrates BioFPT's effectiveness, an accuracy improvement of 4.2% and a parameter reduction of 14.8% compared to state-of-the-art models. Furthermore, it maintains robust performance across diverse pathological conditions and signal qualities. The proposed architecture rep resents a significant advancement in self-supervised ECG analysis, particularly suitable for scenarios with limited labeled data availability. Moreover, its versatile architecture shows promise for broader applications across various biosignals.
{"title":"BioFPT: Biosignal Feature Pyramid Transformer for self-supervised representation learning from ECGsignals.","authors":"Haobo Meng, Caiyuan Zhang, Fangfang Jiang, Ziyu Zhu, Ao Sun, Junxin Chen","doi":"10.1109/JBHI.2026.3669166","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669166","url":null,"abstract":"<p><p>Electrocardiogram (ECG) analysis represents a promising field for deep learning applications in clinical diagnostics. However, practical use of current methods is still constrained by their heavy reliance on large amounts of labeled data, as well as limitations in processing efficiency and signal quality. To address these challenges, we present BioFPT (Biosignal Feature Pyramid Transformer), a novel self-supervised learning framework designed for ECG signal. The proposed framework incorporates a Split Mask-Join (SMJ) transformation for a Pre-training strategy, complemented by an overlapping embedding mechanism that eliminates positional encoding requirements. The efficiency of the architectural design is enhanced through a Spatial Reduction Attention (SRA) transformer, which achieves a reduction in computational complexity without performance degradation. Comprehensive evaluation of seven public ECG datasets comprising over 94,000 sub jects demonstrates BioFPT's effectiveness, an accuracy improvement of 4.2% and a parameter reduction of 14.8% compared to state-of-the-art models. Furthermore, it maintains robust performance across diverse pathological conditions and signal qualities. The proposed architecture rep resents a significant advancement in self-supervised ECG analysis, particularly suitable for scenarios with limited labeled data availability. Moreover, its versatile architecture shows promise for broader applications across various biosignals.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147344078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate segmentation of lesion in fundus OCT images an assist ophthalmologists to determine the degree of retinopathy and choroidopathy. However, OCT images are often acquired from various manufacturers' OCT devices, which is challenging for traditional models due to domain shift. In this paper, a novel multi-source domain adaptation framework is designed to address the challenge of segmenting fundus lesions in OCT images acquired from devices produced by different manufacturers with three core methodological innovations: (1) A multi-order moment consistency approach using moment generating function (MGF) to align feature distributions across domains. By approximating multi-order central moments using derivatives of the MGF, our method theoretically enables efficient alignment of high-order statistical features without explicit computation of polynomial expansions. (2) A perturbation-based feature consistency strategy to improve model robustness. By using segmentation and moment losses to guide perturbation generation, our method explicitly links semantic consistency with feature distribution alignment. (3) A population stability whitening technique to separate style-related and content-related features. By analyzing covariance matrix variances across perturbations, our method attempts to automatically separate style and content features. Our method is compared with several state-of-the-art approaches on two datasets, comprising diverse domains collected from various manufacturers' OCT devices. Experimental results clearly demonstrate the significant superiority of our method.
{"title":"Multi-source Unsupervised Domain Adaptation Fundus Lesion Segmentation of Various OCT Devices with Moment Consistency.","authors":"Dehui Xiang, Guohao Zhang, Zhongyu Chen, Weifang Zhu, Fei Shi, Tao Peng, Xinjian Chen, Haoyu Chen","doi":"10.1109/JBHI.2026.3669176","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669176","url":null,"abstract":"<p><p>Accurate segmentation of lesion in fundus OCT images an assist ophthalmologists to determine the degree of retinopathy and choroidopathy. However, OCT images are often acquired from various manufacturers' OCT devices, which is challenging for traditional models due to domain shift. In this paper, a novel multi-source domain adaptation framework is designed to address the challenge of segmenting fundus lesions in OCT images acquired from devices produced by different manufacturers with three core methodological innovations: (1) A multi-order moment consistency approach using moment generating function (MGF) to align feature distributions across domains. By approximating multi-order central moments using derivatives of the MGF, our method theoretically enables efficient alignment of high-order statistical features without explicit computation of polynomial expansions. (2) A perturbation-based feature consistency strategy to improve model robustness. By using segmentation and moment losses to guide perturbation generation, our method explicitly links semantic consistency with feature distribution alignment. (3) A population stability whitening technique to separate style-related and content-related features. By analyzing covariance matrix variances across perturbations, our method attempts to automatically separate style and content features. Our method is compared with several state-of-the-art approaches on two datasets, comprising diverse domains collected from various manufacturers' OCT devices. Experimental results clearly demonstrate the significant superiority of our method.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147344108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efficient vigilance estimation in driving scenarios requires a balance between model performance and practicality. Electroencephalography (EEG), which can directly reflect brain activity, is widely used for vigilance estimation, but its acquisition process is complicated and difficult to apply to real-world driving. In contrast, physiological signals such as electrooculogram, electrodermal activity, and photoplethysmography have more advantages for practical deployment, but the information they provide is relatively limited. To address the above issues, we propose a delay-aware cross-modal knowledge distillation method. EEG signals are only used to train the teacher model. Then, an information-theoretic criterion based on mutual information and response delay is employed to determine which physiological signals are suitable as student modality for knowledge distillation from the EEG-based teacher model. On this basis, considering the inherent temporal differences caused by different physiological signals with varying sensitivities to cognitive responses, a delay-aware soft alignment mechanism (DASA) is proposed, which handles the temporal misalignment of different physiological signals and captures the asynchronous dynamics of the EEG and other physiological signals through the introduction of learnable delay and spread parameters at the patch level, to achieve soft, temporally-aligned supervision from the teacher to the student model. Finally, an objective function incorporating cross-modal consistency, patch level alignment, and smooth regularization is designed to support the effective training of the proposed cross-modal knowledge distillation method. Extensive experiments on MMV and SEED-VIG datasets validates that the proposed method outperforms existing methods in terms of estimation accuracy and temporal alignment while maintaining the real-time performance required for edge deployment.
{"title":"Delay-Aware Cross-Modal Knowledge Distillation for Driver Vigilance Estimation: Toward Practical Edge Deployment.","authors":"Yu Sun, Shiwu Li, Tongtong Jin, Yiming Bie, Mengzhu Guo, Minghao Fu, Xin Huang","doi":"10.1109/JBHI.2026.3669242","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669242","url":null,"abstract":"<p><p>Efficient vigilance estimation in driving scenarios requires a balance between model performance and practicality. Electroencephalography (EEG), which can directly reflect brain activity, is widely used for vigilance estimation, but its acquisition process is complicated and difficult to apply to real-world driving. In contrast, physiological signals such as electrooculogram, electrodermal activity, and photoplethysmography have more advantages for practical deployment, but the information they provide is relatively limited. To address the above issues, we propose a delay-aware cross-modal knowledge distillation method. EEG signals are only used to train the teacher model. Then, an information-theoretic criterion based on mutual information and response delay is employed to determine which physiological signals are suitable as student modality for knowledge distillation from the EEG-based teacher model. On this basis, considering the inherent temporal differences caused by different physiological signals with varying sensitivities to cognitive responses, a delay-aware soft alignment mechanism (DASA) is proposed, which handles the temporal misalignment of different physiological signals and captures the asynchronous dynamics of the EEG and other physiological signals through the introduction of learnable delay and spread parameters at the patch level, to achieve soft, temporally-aligned supervision from the teacher to the student model. Finally, an objective function incorporating cross-modal consistency, patch level alignment, and smooth regularization is designed to support the effective training of the proposed cross-modal knowledge distillation method. Extensive experiments on MMV and SEED-VIG datasets validates that the proposed method outperforms existing methods in terms of estimation accuracy and temporal alignment while maintaining the real-time performance required for edge deployment.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147344080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}