首页 > 最新文献

IEEE Journal of Biomedical and Health Informatics最新文献

英文 中文
Multi-level Asymmetric Contrastive Learning for Medical Image Segmentation Pre-training. 多层次非对称对比学习用于医学图像分割预训练。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669549
Shuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Zhaoheng Xie, Micky C Nnamdi, Wenqi Shi, J Ben Tamo, May D Wang, Yanye Lu

Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Firstly existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. Furthermore they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for enhancing medical image segmentation. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. i.e. Our MACL achieves a superior performance with more precise predictions from visualization figures and 1.72%, 7.87%, 2.49% and 1.48% Dice higher than previous best results on ACDC, MMWHS, HVSMR and CHAOS with 10% labeled data, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones.

医学图像分割是一项基本但具有挑战性的任务,因为从专家那里获取大量高质量的标记数据是一个艰巨的过程。对比学习为这一困境提供了一个有希望但仍有问题的解决方案。首先,现有的医学对比学习策略侧重于提取图像级表征,忽略了丰富的多层次表征。此外,它们通过随机初始化或将预训练与编码器分开来充分利用解码器,从而忽略了编码器和解码器之间的潜在协作。为了解决这些问题,我们提出了一种新的多层次非对称对比学习框架MACL来增强医学图像分割。具体来说,我们设计了一个非对称对比学习结构来同时预训练编码器和解码器,为分割模型提供更好的初始化。此外,我们开发了一种多层次的对比学习策略,该策略集成了特征级、图像级和像素级表示之间的对应关系,以确保编码器和解码器在预训练阶段从不同尺度和粒度的表示中捕获全面的细节。最后,在8个医学图像数据集上的实验表明,我们的MACL框架优于现有的11种对比学习策略。也就是说,我们的MACL在可视化数据上的预测精度更高,比之前在ACDC、MMWHS、HVSMR和CHAOS上的最佳结果分别提高了1.72%、7.87%、2.49%和1.48%。我们的MACL在5个不同的U-Net骨干网中也有很强的泛化能力。
{"title":"Multi-level Asymmetric Contrastive Learning for Medical Image Segmentation Pre-training.","authors":"Shuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Zhaoheng Xie, Micky C Nnamdi, Wenqi Shi, J Ben Tamo, May D Wang, Yanye Lu","doi":"10.1109/JBHI.2026.3669549","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669549","url":null,"abstract":"<p><p>Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Firstly existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. Furthermore they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for enhancing medical image segmentation. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. i.e. Our MACL achieves a superior performance with more precise predictions from visualization figures and 1.72%, 7.87%, 2.49% and 1.48% Dice higher than previous best results on ACDC, MMWHS, HVSMR and CHAOS with 10% labeled data, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Conversational Literature Retrieval Quality via Personalized Profile-Based Re-ranking. 基于个性化档案的重排序提高会话文献检索质量。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669741
ShuaiYu Zhang, Huihui Shao, Zhenping Xie

Academic literature retrieval is constrained by the paradox of "information overload" versus "evidence scarcity", a tension that deepens when researchers iteratively refine their queries in multi-turn conversational settings. To address this challenge, we propose Conversational Literature Personalized Re-ranking (CLPR), a personalized framework that unifies dense semantic retrieval with personalized user profiling. CLPR first performs a broad high-recall retrieval to collect candidate documents, then compresses conversational history into a concise textual profile that encodes sequential continuity, immediate focus, and long-term research background via a large language model. The generated profile serves as a pseudo-query for a neural cross-encoder to produce the final ranking. Cross-domain testing on the public LitSearch (computer science) benchmark confirms its robust generalization, yielding an NDCG@10 of 0.4793. On MedCorpus, a new multi-turn biomedical conversational retrieval benchmark constructed for this study, CLPR attains state-of-the-art performance with P@1 = 0.9497 and NDCG@10 = 0.9271, surpassing the strongest baseline by substantial margins. Ablation shows long-term background cues contribute most, and maintaining a short, up-to-date profile across turns outperforms a static one. CLPR therefore delivers accurate, personalized literature retrieval and can accelerate evidence synthesis across scientific domains.

学术文献检索受到“信息过载”与“证据稀缺”的矛盾制约,当研究人员在多回合对话设置中迭代地改进他们的查询时,这种矛盾会加深。为了解决这一挑战,我们提出了会话文献个性化重新排序(CLPR),这是一个将密集语义检索与个性化用户分析相结合的个性化框架。CLPR首先执行广泛的高召回检索来收集候选文档,然后将会话历史压缩为简洁的文本概要,该概要通过大型语言模型对顺序连续性、即时焦点和长期研究背景进行编码。生成的配置文件用作神经交叉编码器的伪查询,以生成最终排名。在公共LitSearch(计算机科学)基准上进行的跨域测试证实了其强大的泛化,得出NDCG@10为0.4793。在MedCorpus(为本研究构建的一个新的多回合生物医学会话检索基准)上,CLPR达到了最先进的性能,P@1 = 0.9497和NDCG@10 = 0.9271,大大超过了最强基线。消融显示长期的背景线索贡献最大,在回合中保持短的、最新的轮廓比静态的要好。因此,CLPR提供准确、个性化的文献检索,并可以加速跨科学领域的证据合成。
{"title":"Improving Conversational Literature Retrieval Quality via Personalized Profile-Based Re-ranking.","authors":"ShuaiYu Zhang, Huihui Shao, Zhenping Xie","doi":"10.1109/JBHI.2026.3669741","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669741","url":null,"abstract":"<p><p>Academic literature retrieval is constrained by the paradox of \"information overload\" versus \"evidence scarcity\", a tension that deepens when researchers iteratively refine their queries in multi-turn conversational settings. To address this challenge, we propose Conversational Literature Personalized Re-ranking (CLPR), a personalized framework that unifies dense semantic retrieval with personalized user profiling. CLPR first performs a broad high-recall retrieval to collect candidate documents, then compresses conversational history into a concise textual profile that encodes sequential continuity, immediate focus, and long-term research background via a large language model. The generated profile serves as a pseudo-query for a neural cross-encoder to produce the final ranking. Cross-domain testing on the public LitSearch (computer science) benchmark confirms its robust generalization, yielding an NDCG@10 of 0.4793. On MedCorpus, a new multi-turn biomedical conversational retrieval benchmark constructed for this study, CLPR attains state-of-the-art performance with P@1 = 0.9497 and NDCG@10 = 0.9271, surpassing the strongest baseline by substantial margins. Ablation shows long-term background cues contribute most, and maintaining a short, up-to-date profile across turns outperforms a static one. CLPR therefore delivers accurate, personalized literature retrieval and can accelerate evidence synthesis across scientific domains.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
R2GenCSR: Mining Contextual and Residual Information for LLMs-based Radiology Report Generation. R2GenCSR:基于llms的放射学报告生成的上下文和剩余信息挖掘。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669539
Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang

Inspired by the tremendous success of Large Language Models (LLMs), existing Radiology report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve final results is an urgent problem that needs to be solved. Additionally, the use of visual Transformer models also brings high computational complexity. To address these issues, this paper proposes a novel context-guided efficient radiology report generation framework. Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model. More importantly, we perform context retrieval from the training set for samples within each mini-batch during the training phase, utilizing both positively and negatively related samples to enhance feature representation and discriminative learning. Subsequently, we feed the vision tokens, context information, and prompt statements to invoke the LLM for generating high-quality medical reports. Extensive experiments on three X-ray report generation datasets (i.e., IU X-Ray, MIMIC-CXR, CheXpert Plus) fully validated the effectiveness of our proposed model. The source code is available at https://github.com/Event-AHU/Medical_ Image_Analysis.

受到大型语言模型(llm)巨大成功的启发,现有的放射学报告生成方法试图利用大型模型来实现更好的性能。他们通常采用Transformer来提取给定x射线图像的视觉特征,然后将其输入LLM进行文本生成。如何为法学硕士提取更有效的信息,帮助他们改善最终结果,是一个迫切需要解决的问题。此外,可视化Transformer模型的使用也带来了很高的计算复杂度。为了解决这些问题,本文提出了一种新的上下文引导的高效放射学报告生成框架。具体来说,我们引入了具有线性复杂度的曼巴作为视觉主干,得到了与强Transformer模型相当的性能。更重要的是,我们在训练阶段对每个mini-batch中的样本从训练集中进行上下文检索,利用正相关和负相关样本来增强特征表示和判别学习。随后,我们提供视觉令牌、上下文信息和提示语句,以调用LLM来生成高质量的医疗报告。在三个x射线报告生成数据集(即IU X-ray, MIMIC-CXR, CheXpert Plus)上进行的大量实验充分验证了我们提出的模型的有效性。源代码可从https://github.com/Event-AHU/Medical_ Image_Analysis获得。
{"title":"R2GenCSR: Mining Contextual and Residual Information for LLMs-based Radiology Report Generation.","authors":"Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang","doi":"10.1109/JBHI.2026.3669539","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669539","url":null,"abstract":"<p><p>Inspired by the tremendous success of Large Language Models (LLMs), existing Radiology report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve final results is an urgent problem that needs to be solved. Additionally, the use of visual Transformer models also brings high computational complexity. To address these issues, this paper proposes a novel context-guided efficient radiology report generation framework. Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model. More importantly, we perform context retrieval from the training set for samples within each mini-batch during the training phase, utilizing both positively and negatively related samples to enhance feature representation and discriminative learning. Subsequently, we feed the vision tokens, context information, and prompt statements to invoke the LLM for generating high-quality medical reports. Extensive experiments on three X-ray report generation datasets (i.e., IU X-Ray, MIMIC-CXR, CheXpert Plus) fully validated the effectiveness of our proposed model. The source code is available at https://github.com/Event-AHU/Medical_ Image_Analysis.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147347506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RT-SAM: Visual-Prompt Fusion and Uncertainty Enhancement for Nasopharyngeal Carcinoma Radiotherapy Target Delineation. RT-SAM:视觉提示融合和不确定性增强鼻咽癌放疗靶标划定。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669979
Hee Guan Khor, Xin Yang, Yihua Sun, Sijuan Huang, Yingni Wang, Jie Wang, Shaobin Wang, Lu Bai, Longfei Ma, Hongen Liao

Precise delineation of the clinical target volume (CTV) and nodal CTV (CTV$_{{mathit{nd}}}$) is crucial for effective radiotherapy planning in nasopharyngeal carcinoma (NPC). Manual contouring is labor-intensive and subject to substantial inter-observer variability, particularly in regions with complex anatomy and indistinct boundaries. This study presents RT-SAM, a novel framework that adapts the Medical Segment Anything Model 2 (MedSAM-2) for automated CTV (i.e., primary CTV and CTV$_{nd}$) contouring in NPC computed tomography (CT) images. The framework synergistically integrates a generalist foundation model (MedSAM-2) with a domain-specific specialist network (2D U-Net) through three principal contributions: (1) automated generation of multi-modal prompts-comprising mask, bounding box, and point representations-derived from specialist network predictions to guide the generalist model; (2) a Visual-Prompt Fusion Attention (ViPFA) mechanism that optimizes feature-prompt interactions through bidirectional cross-modal attention; and (3) an Uncertainty-Enhanced Prediction Adjustment (UEPA) mechanism that enhances model robustness via confidence-based refinement and selective domain adaptation. Comprehensive evaluation on a multi-center cohort of 256 clinical NPC cases from Sun Yat-sen University Cancer Center and 212 public NPC cases from the SegRap2025 lymph node CTV dataset using 5- fold cross-validation demonstrates that RT-SAM achieves a mean DICE coefficient of 0.796 $pm$ 0.033 (mean $pm$ standard deviation), significantly outperforming current state-of-the-art methods. Clinical validation by eight radiation oncologists demonstrates that RT-SAM contours are clinically indistinguishable from expert delineations in blinded Turing assessments, achieve superior quality ratings in 75% of comparisons with mean scores of 2.73 for RT-SAM versus 2.66 for manual expert contours, and attain clinically acceptable ratings in over 97% of cases. These results demonstrate that RT-SAM is a clinically feasible solution for automated CTV contouring, with strong potential to standardize treatment planning and mitigate inter-observer variability in NPC radiotherapy.

准确定位临床靶体积(CTV)和淋巴结靶体积(CTV$_{{mathit{and}}}$)对于鼻咽癌(NPC)的有效放疗规划至关重要。手动轮廓是劳动密集型的,并且受到观察者之间的大量变化,特别是在具有复杂解剖结构和模糊边界的区域。本研究提出了RT-SAM,这是一种新的框架,它适应了医学分段任意模型2 (MedSAM-2),用于NPC计算机断层扫描(CT)图像的自动CTV(即初级CTV和CTV$_{nd}$)轮廓。该框架通过三个主要贡献协同集成了通才基础模型(MedSAM-2)和特定领域的专家网络(2D U-Net):(1)从专家网络预测中自动生成多模态提示(包括掩码、边界框和点表示),以指导通才模型;(2)视觉提示融合注意(ViPFA)机制,通过双向跨模态注意优化特征提示交互;(3)不确定性增强预测调整(UEPA)机制,通过基于置信度的细化和选择性域自适应来增强模型的鲁棒性。对来自中山大学癌症中心的256例临床鼻咽癌病例和来自SegRap2025淋巴结CTV数据集的212例公开鼻咽癌病例进行5倍交叉验证的综合评估表明,RT-SAM的平均DICE系数为0.796 $pm$ 0.033(平均$pm$标准差),显著优于当前最先进的方法。8位放射肿瘤学家的临床验证表明,RT-SAM轮廓在临床上与盲法图灵评估中的专家描绘无法区分,与RT-SAM的平均得分2.73相比,手动专家轮廓的平均得分2.66,在75%的情况下获得了更高的质量评级,并且在超过97%的病例中获得了临床可接受的评级。这些结果表明,RT-SAM是一种临床可行的自动CTV轮廓解决方案,具有标准化治疗计划和减轻鼻咽癌放疗中观察者间差异的强大潜力。
{"title":"RT-SAM: Visual-Prompt Fusion and Uncertainty Enhancement for Nasopharyngeal Carcinoma Radiotherapy Target Delineation.","authors":"Hee Guan Khor, Xin Yang, Yihua Sun, Sijuan Huang, Yingni Wang, Jie Wang, Shaobin Wang, Lu Bai, Longfei Ma, Hongen Liao","doi":"10.1109/JBHI.2026.3669979","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669979","url":null,"abstract":"<p><p>Precise delineation of the clinical target volume (CTV) and nodal CTV (CTV$_{{mathit{nd}}}$) is crucial for effective radiotherapy planning in nasopharyngeal carcinoma (NPC). Manual contouring is labor-intensive and subject to substantial inter-observer variability, particularly in regions with complex anatomy and indistinct boundaries. This study presents RT-SAM, a novel framework that adapts the Medical Segment Anything Model 2 (MedSAM-2) for automated CTV (i.e., primary CTV and CTV$_{nd}$) contouring in NPC computed tomography (CT) images. The framework synergistically integrates a generalist foundation model (MedSAM-2) with a domain-specific specialist network (2D U-Net) through three principal contributions: (1) automated generation of multi-modal prompts-comprising mask, bounding box, and point representations-derived from specialist network predictions to guide the generalist model; (2) a Visual-Prompt Fusion Attention (ViPFA) mechanism that optimizes feature-prompt interactions through bidirectional cross-modal attention; and (3) an Uncertainty-Enhanced Prediction Adjustment (UEPA) mechanism that enhances model robustness via confidence-based refinement and selective domain adaptation. Comprehensive evaluation on a multi-center cohort of 256 clinical NPC cases from Sun Yat-sen University Cancer Center and 212 public NPC cases from the SegRap2025 lymph node CTV dataset using 5- fold cross-validation demonstrates that RT-SAM achieves a mean DICE coefficient of 0.796 $pm$ 0.033 (mean $pm$ standard deviation), significantly outperforming current state-of-the-art methods. Clinical validation by eight radiation oncologists demonstrates that RT-SAM contours are clinically indistinguishable from expert delineations in blinded Turing assessments, achieve superior quality ratings in 75% of comparisons with mean scores of 2.73 for RT-SAM versus 2.66 for manual expert contours, and attain clinically acceptable ratings in over 97% of cases. These results demonstrate that RT-SAM is a clinically feasible solution for automated CTV contouring, with strong potential to standardize treatment planning and mitigate inter-observer variability in NPC radiotherapy.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fall Warning Method Based on Multimodal Sensor Fusion and Gait Phase Detection. 基于多模态传感器融合和步态相位检测的跌倒预警方法。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669717
Wenxuan Zhang, Qian Liang, Xiaohui Jia, Chunhu Bian, Yuxuan Guo, Tiejun Li, Jinyue Liu

Falls are a common and serious cause of injury among the elderly and individuals with mobility impairments. In particular, under complex gait conditions, the early detection of imbalance is crucial for fall prevention. To address the limitations of existing methods in fall phase identification and the scarcity of real fall data, this study proposes a fall warning method based on multimodal sensor fusion and gait phase detection. By combining data from plantar pressure sensors and inertial measurement units, a gait phase detection module is introduced to achieve fine division of the gait cycle, enhancing the system's ability to detect early imbalance features. Additionally, a hybrid dataset integrating simulation data with real data is constructed, and multiple linear regression is used to accurately map simulation and real data, mitigating the issue of limited samples. Experimental results demonstrate that the proposed method achieves an accuracy of 94.8%, a recall of 92.8%, and a precision of 94.2%. It further maintains stable performance in cross-subject tests and multi-scenario evaluations, demonstrating strong reliability and generalization capability.

在老年人和行动障碍人群中,跌倒是一种常见且严重的伤害原因。特别是在复杂的步态条件下,早期发现不平衡对于预防跌倒至关重要。针对现有跌倒相位识别方法的局限性和真实跌倒数据的稀缺性,本研究提出了一种基于多模态传感器融合和步态相位检测的跌倒预警方法。结合足底压力传感器和惯性测量单元的数据,引入步态相位检测模块,实现步态周期的精细划分,增强系统对早期不平衡特征的检测能力。此外,构建了仿真数据与真实数据相结合的混合数据集,并利用多元线性回归对仿真数据与真实数据进行了精确映射,缓解了样本有限的问题。实验结果表明,该方法的准确率为94.8%,召回率为92.8%,精密度为94.2%。在跨学科测试和多场景评估中保持稳定的性能,具有较强的可靠性和泛化能力。
{"title":"Fall Warning Method Based on Multimodal Sensor Fusion and Gait Phase Detection.","authors":"Wenxuan Zhang, Qian Liang, Xiaohui Jia, Chunhu Bian, Yuxuan Guo, Tiejun Li, Jinyue Liu","doi":"10.1109/JBHI.2026.3669717","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669717","url":null,"abstract":"<p><p>Falls are a common and serious cause of injury among the elderly and individuals with mobility impairments. In particular, under complex gait conditions, the early detection of imbalance is crucial for fall prevention. To address the limitations of existing methods in fall phase identification and the scarcity of real fall data, this study proposes a fall warning method based on multimodal sensor fusion and gait phase detection. By combining data from plantar pressure sensors and inertial measurement units, a gait phase detection module is introduced to achieve fine division of the gait cycle, enhancing the system's ability to detect early imbalance features. Additionally, a hybrid dataset integrating simulation data with real data is constructed, and multiple linear regression is used to accurately map simulation and real data, mitigating the issue of limited samples. Experimental results demonstrate that the proposed method achieves an accuracy of 94.8%, a recall of 92.8%, and a precision of 94.2%. It further maintains stable performance in cross-subject tests and multi-scenario evaluations, demonstrating strong reliability and generalization capability.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revealing Sleep Dynamics with PCT-CRV: A Novel Approach for Automatic Sleep Staging and Tracking Transitions using PSG Signals. 利用PCT-CRV揭示睡眠动态:一种利用PSG信号自动进行睡眠分期和跟踪转换的新方法。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3668939
Tehreem Fatima Zaidi, Abhishek Dixit, Deepak Joshi, Shiv Dutt Joshi

Polysomnography (PSG)-based accurate sleep staging is essential to monitor sleep quality and sleep-related disorders. Despite previous attempts for improving the performance of automatic sleep staging, there are certain limitations: 1) neglecting synchronization patterns in their time-frequency (TF) domain, 2) not utilizing both local and global features within sleep epochs, and 3) neglecting correlation patterns for tracking transitions between sleep stages. To address them, we propose a novel framework based on the polynomial chirplet transform-derived characteristic response vector (PCT-CRV) for the assessment of sleep stages. In this work, we perform the time-domain PCT (TPCT) and frequency-domain PCT (FPCT) to enhance the TF representation of nonstationary PSG signals. From these PCT representations, we construct correlation matrices across their frequency bins within short-time windows to obtain characteristic response vectors (CRVs), which are the sums of eigenvectors, weighted by their corresponding eigenvalues. Subsequently, a comprehensive set of local and global features is derived from PCT-CRVs, which is subjected to various machine learning-based classifiers. Our PCT-CRV excels on three datasets, surpassing existing methods, and outperforming wavelet-based and synchrosqueezed-based CRV methods. Furthermore, to track transitions of sleep stages, we form sub-band PCT-CRVs using eigenvectors with maximum information, depending upon the physics of our problem. We hypothesize that sleep stages are characterized by specific correlation profiles, within different frequency bins. Hence, sub-band PCT-CRVs corresponding to the dominant eigenvectors, would detect transition of sleep stages across all epochs. All these results highlight the efficacy of our method in tracking sleep stage transitions and improving their classification performance.

基于多导睡眠图(PSG)的准确睡眠分期对于监测睡眠质量和睡眠相关障碍至关重要。尽管之前有过提高自动睡眠分期性能的尝试,但仍存在一定的局限性:1)忽略了时频(TF)域的同步模式;2)没有在睡眠时期内同时利用局部和全局特征;3)忽略了跟踪睡眠阶段之间转换的相关模式。为了解决这些问题,我们提出了一个基于多项式啁啾变换衍生特征响应向量(PCT-CRV)的新框架,用于评估睡眠阶段。在这项工作中,我们执行时域PCT (TPCT)和频域PCT (FPCT)来增强非平稳PSG信号的TF表示。从这些PCT表示中,我们在短时间窗口内的频率箱上构建相关矩阵,以获得特征响应向量(crv),它是特征向量的和,由其相应的特征值加权。随后,从pct - crv中得到一套全面的局部和全局特征,并将其置于各种基于机器学习的分类器中。我们的PCT-CRV在三个数据集上表现出色,超越了现有的方法,并且优于基于小波和基于同步压缩的CRV方法。此外,为了跟踪睡眠阶段的转变,我们根据问题的物理性质,使用具有最大信息的特征向量形成子带pct - crv。我们假设睡眠阶段在不同的频率范围内具有特定的相关特征。因此,与主要特征向量相对应的子带pct - crv将检测所有时代睡眠阶段的过渡。所有这些结果都突出了我们的方法在跟踪睡眠阶段转换和提高其分类性能方面的有效性。
{"title":"Revealing Sleep Dynamics with PCT-CRV: A Novel Approach for Automatic Sleep Staging and Tracking Transitions using PSG Signals.","authors":"Tehreem Fatima Zaidi, Abhishek Dixit, Deepak Joshi, Shiv Dutt Joshi","doi":"10.1109/JBHI.2026.3668939","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3668939","url":null,"abstract":"<p><p>Polysomnography (PSG)-based accurate sleep staging is essential to monitor sleep quality and sleep-related disorders. Despite previous attempts for improving the performance of automatic sleep staging, there are certain limitations: 1) neglecting synchronization patterns in their time-frequency (TF) domain, 2) not utilizing both local and global features within sleep epochs, and 3) neglecting correlation patterns for tracking transitions between sleep stages. To address them, we propose a novel framework based on the polynomial chirplet transform-derived characteristic response vector (PCT-CRV) for the assessment of sleep stages. In this work, we perform the time-domain PCT (TPCT) and frequency-domain PCT (FPCT) to enhance the TF representation of nonstationary PSG signals. From these PCT representations, we construct correlation matrices across their frequency bins within short-time windows to obtain characteristic response vectors (CRVs), which are the sums of eigenvectors, weighted by their corresponding eigenvalues. Subsequently, a comprehensive set of local and global features is derived from PCT-CRVs, which is subjected to various machine learning-based classifiers. Our PCT-CRV excels on three datasets, surpassing existing methods, and outperforming wavelet-based and synchrosqueezed-based CRV methods. Furthermore, to track transitions of sleep stages, we form sub-band PCT-CRVs using eigenvectors with maximum information, depending upon the physics of our problem. We hypothesize that sleep stages are characterized by specific correlation profiles, within different frequency bins. Hence, sub-band PCT-CRVs corresponding to the dominant eigenvectors, would detect transition of sleep stages across all epochs. All these results highlight the efficacy of our method in tracking sleep stage transitions and improving their classification performance.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147347831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 6G-Enabled Hierarchical Contrastive Learning Framework for Multi-Scale Medical Time Series Analysis. 基于6g的多层次对比学习框架的多尺度医学时间序列分析。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669510
Le Sun, Jie Lin, Zhiguo Qu, Yimin Yu, Jinliang Liu, Deepak Gupta, Yanchun Zhang

Medical time series analysis, particularly for electrocardiogram (ECG) and electroencephalogram (EEG) signals, is essential in modern diagnostics, supporting early detection of conditions such as arrhythmias and epileptic seizures. However, existing approaches often struggle to capture multi-scale periodic patterns and longrange dependencies while meeting real-time processing demands. The envisioned 6G networks, with their terahertz communication and integrated sensing and communication (ISAC) capabilities, will generate vast volumes of high-fidelity physiological data at the network edge. This paradigm shift intensifies the conflict between the computational complexity of advanced AI models and the limited resources of edge devices, creating a critical bottleneck for deploying sophisticated analytics in real-world healthcare scenarios. To overcome these limitations, this paper introduces a 6G-enabled hierarchical contrastive learning framework, referred to as Hierarchical Contrastive Learning for Multi-Scale Medical time series analysis (HCL-MSM), which integrates three core components: a signal-adaptive encoder based on multi-period decomposition and 2D convolution, a patient-level contrastive module enhanced with decomposable multi-scale mixing, and a 6G-edge deployment module optimized via quantization and pruning. The framework effectively models nested physiological rhythms and cross-time dependencies in medical data, while maintaining low-latency operation under resource-constrained edge environments. We evaluated HCL-MSM on multiple clinical datasets under simulated 6G settings. Our framework achieves significant gains in arrhythmia detection, seizure prediction, and neurological monitoring.We evaluated HCL-MSM on multiple clinical datasets under simulated 6G settings. Our framework achieves significant gains in arrhythmia detection (F1-score: 86.39 percent), seizure prediction (Recall: 87.72 percent), and neurological monitoring (Recall: 87.8 percent), outperforming existing state-of- the-art methods.

医学时间序列分析,特别是心电图(ECG)和脑电图(EEG)信号的时间序列分析,在现代诊断中至关重要,可支持心律失常和癫痫发作等疾病的早期检测。然而,现有的方法在满足实时处理需求的同时,往往难以捕获多尺度周期模式和长期依赖关系。设想中的6G网络具有太赫兹通信和集成传感与通信(ISAC)能力,将在网络边缘产生大量高保真的生理数据。这种范式转变加剧了高级人工智能模型的计算复杂性与边缘设备有限资源之间的冲突,为在现实医疗场景中部署复杂的分析造成了关键瓶颈。为了克服这些限制,本文引入了一种支持6g的分层对比学习框架,称为用于多尺度医学时间序列分析的分层对比学习(HCL-MSM),该框架集成了三个核心组件:基于多周期分解和二维卷积的信号自适应编码器,通过可分解的多尺度混合增强的患者级对比模块,以及通过量化和修剪优化的6g边缘部署模块。该框架有效地对医疗数据中的嵌套生理节律和跨时间依赖性进行建模,同时在资源受限的边缘环境下保持低延迟操作。在模拟6G设置下,我们在多个临床数据集上评估了HCL-MSM。我们的框架在心律失常检测、癫痫发作预测和神经监测方面取得了重大进展。在模拟6G设置下,我们在多个临床数据集上评估了HCL-MSM。我们的框架在心律失常检测(f1评分:86.39%)、癫痫发作预测(召回率:87.72%)和神经监测(召回率:87.8%)方面取得了显著的进步,优于现有的最先进的方法。
{"title":"A 6G-Enabled Hierarchical Contrastive Learning Framework for Multi-Scale Medical Time Series Analysis.","authors":"Le Sun, Jie Lin, Zhiguo Qu, Yimin Yu, Jinliang Liu, Deepak Gupta, Yanchun Zhang","doi":"10.1109/JBHI.2026.3669510","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669510","url":null,"abstract":"<p><p>Medical time series analysis, particularly for electrocardiogram (ECG) and electroencephalogram (EEG) signals, is essential in modern diagnostics, supporting early detection of conditions such as arrhythmias and epileptic seizures. However, existing approaches often struggle to capture multi-scale periodic patterns and longrange dependencies while meeting real-time processing demands. The envisioned 6G networks, with their terahertz communication and integrated sensing and communication (ISAC) capabilities, will generate vast volumes of high-fidelity physiological data at the network edge. This paradigm shift intensifies the conflict between the computational complexity of advanced AI models and the limited resources of edge devices, creating a critical bottleneck for deploying sophisticated analytics in real-world healthcare scenarios. To overcome these limitations, this paper introduces a 6G-enabled hierarchical contrastive learning framework, referred to as Hierarchical Contrastive Learning for Multi-Scale Medical time series analysis (HCL-MSM), which integrates three core components: a signal-adaptive encoder based on multi-period decomposition and 2D convolution, a patient-level contrastive module enhanced with decomposable multi-scale mixing, and a 6G-edge deployment module optimized via quantization and pruning. The framework effectively models nested physiological rhythms and cross-time dependencies in medical data, while maintaining low-latency operation under resource-constrained edge environments. We evaluated HCL-MSM on multiple clinical datasets under simulated 6G settings. Our framework achieves significant gains in arrhythmia detection, seizure prediction, and neurological monitoring.We evaluated HCL-MSM on multiple clinical datasets under simulated 6G settings. Our framework achieves significant gains in arrhythmia detection (F1-score: 86.39 percent), seizure prediction (Recall: 87.72 percent), and neurological monitoring (Recall: 87.8 percent), outperforming existing state-of- the-art methods.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BioFPT: Biosignal Feature Pyramid Transformer for self-supervised representation learning from ECGsignals. 生物信号特征金字塔变压器,用于自监督表示学习。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-02 DOI: 10.1109/JBHI.2026.3669166
Haobo Meng, Caiyuan Zhang, Fangfang Jiang, Ziyu Zhu, Ao Sun, Junxin Chen

Electrocardiogram (ECG) analysis represents a promising field for deep learning applications in clinical diagnostics. However, practical use of current methods is still constrained by their heavy reliance on large amounts of labeled data, as well as limitations in processing efficiency and signal quality. To address these challenges, we present BioFPT (Biosignal Feature Pyramid Transformer), a novel self-supervised learning framework designed for ECG signal. The proposed framework incorporates a Split Mask-Join (SMJ) transformation for a Pre-training strategy, complemented by an overlapping embedding mechanism that eliminates positional encoding requirements. The efficiency of the architectural design is enhanced through a Spatial Reduction Attention (SRA) transformer, which achieves a reduction in computational complexity without performance degradation. Comprehensive evaluation of seven public ECG datasets comprising over 94,000 sub jects demonstrates BioFPT's effectiveness, an accuracy improvement of 4.2% and a parameter reduction of 14.8% compared to state-of-the-art models. Furthermore, it maintains robust performance across diverse pathological conditions and signal qualities. The proposed architecture rep resents a significant advancement in self-supervised ECG analysis, particularly suitable for scenarios with limited labeled data availability. Moreover, its versatile architecture shows promise for broader applications across various biosignals.

心电图(ECG)分析是深度学习在临床诊断中应用的一个有前途的领域。然而,当前方法的实际应用仍然受到严重依赖大量标记数据以及处理效率和信号质量的限制。为了解决这些挑战,我们提出了BioFPT(生物信号特征金字塔变压器),这是一种针对心电信号设计的新型自监督学习框架。提出的框架结合了分割掩码连接(SMJ)转换作为预训练策略,辅以重叠嵌入机制,消除了位置编码要求。通过空间减少注意力(SRA)变压器提高了架构设计的效率,在不降低性能的情况下实现了计算复杂度的降低。对包括超过94,000名受试者的七个公共ECG数据集的综合评估表明,与最先进的模型相比,BioFPT的有效性,准确性提高4.2%,参数减少14.8%。此外,它在不同的病理条件和信号质量中保持稳健的性能。所提出的架构代表了自我监督ECG分析的重大进步,特别适用于标记数据可用性有限的场景。此外,它的多用途结构显示了在各种生物信号中更广泛应用的前景。
{"title":"BioFPT: Biosignal Feature Pyramid Transformer for self-supervised representation learning from ECGsignals.","authors":"Haobo Meng, Caiyuan Zhang, Fangfang Jiang, Ziyu Zhu, Ao Sun, Junxin Chen","doi":"10.1109/JBHI.2026.3669166","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669166","url":null,"abstract":"<p><p>Electrocardiogram (ECG) analysis represents a promising field for deep learning applications in clinical diagnostics. However, practical use of current methods is still constrained by their heavy reliance on large amounts of labeled data, as well as limitations in processing efficiency and signal quality. To address these challenges, we present BioFPT (Biosignal Feature Pyramid Transformer), a novel self-supervised learning framework designed for ECG signal. The proposed framework incorporates a Split Mask-Join (SMJ) transformation for a Pre-training strategy, complemented by an overlapping embedding mechanism that eliminates positional encoding requirements. The efficiency of the architectural design is enhanced through a Spatial Reduction Attention (SRA) transformer, which achieves a reduction in computational complexity without performance degradation. Comprehensive evaluation of seven public ECG datasets comprising over 94,000 sub jects demonstrates BioFPT's effectiveness, an accuracy improvement of 4.2% and a parameter reduction of 14.8% compared to state-of-the-art models. Furthermore, it maintains robust performance across diverse pathological conditions and signal qualities. The proposed architecture rep resents a significant advancement in self-supervised ECG analysis, particularly suitable for scenarios with limited labeled data availability. Moreover, its versatile architecture shows promise for broader applications across various biosignals.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147344078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-source Unsupervised Domain Adaptation Fundus Lesion Segmentation of Various OCT Devices with Moment Consistency. 基于矩一致性的多源无监督域自适应OCT眼底病灶分割。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-02 DOI: 10.1109/JBHI.2026.3669176
Dehui Xiang, Guohao Zhang, Zhongyu Chen, Weifang Zhu, Fei Shi, Tao Peng, Xinjian Chen, Haoyu Chen

Accurate segmentation of lesion in fundus OCT images an assist ophthalmologists to determine the degree of retinopathy and choroidopathy. However, OCT images are often acquired from various manufacturers' OCT devices, which is challenging for traditional models due to domain shift. In this paper, a novel multi-source domain adaptation framework is designed to address the challenge of segmenting fundus lesions in OCT images acquired from devices produced by different manufacturers with three core methodological innovations: (1) A multi-order moment consistency approach using moment generating function (MGF) to align feature distributions across domains. By approximating multi-order central moments using derivatives of the MGF, our method theoretically enables efficient alignment of high-order statistical features without explicit computation of polynomial expansions. (2) A perturbation-based feature consistency strategy to improve model robustness. By using segmentation and moment losses to guide perturbation generation, our method explicitly links semantic consistency with feature distribution alignment. (3) A population stability whitening technique to separate style-related and content-related features. By analyzing covariance matrix variances across perturbations, our method attempts to automatically separate style and content features. Our method is compared with several state-of-the-art approaches on two datasets, comprising diverse domains collected from various manufacturers' OCT devices. Experimental results clearly demonstrate the significant superiority of our method.

眼底OCT图像中病灶的准确分割有助于眼科医生判断视网膜病变和脉络膜病变的程度。然而,OCT图像通常来自不同制造商的OCT设备,这对传统模型来说是一个挑战,因为域移位。本文设计了一种新的多源域自适应框架,以解决从不同制造商生产的设备获取的OCT图像中分割眼底病变的挑战,该框架具有三个核心方法创新:(1)使用矩生成函数(MGF)对齐跨域特征分布的多阶矩一致性方法。通过使用MGF的导数逼近多阶中心矩,我们的方法在理论上能够有效地对齐高阶统计特征,而无需显式计算多项式展开。(2)基于扰动的特征一致性策略提高模型鲁棒性。通过使用分割和矩损失来指导扰动生成,我们的方法明确地将语义一致性与特征分布对齐联系起来。(3)采用种群稳定性美白技术分离风格相关特征和内容相关特征。通过分析跨扰动的协方差矩阵方差,我们的方法试图自动分离风格和内容特征。我们的方法在两个数据集上与几种最先进的方法进行了比较,这些数据集包括从不同制造商的OCT设备收集的不同域。实验结果清楚地表明了我们的方法的显著优越性。
{"title":"Multi-source Unsupervised Domain Adaptation Fundus Lesion Segmentation of Various OCT Devices with Moment Consistency.","authors":"Dehui Xiang, Guohao Zhang, Zhongyu Chen, Weifang Zhu, Fei Shi, Tao Peng, Xinjian Chen, Haoyu Chen","doi":"10.1109/JBHI.2026.3669176","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669176","url":null,"abstract":"<p><p>Accurate segmentation of lesion in fundus OCT images an assist ophthalmologists to determine the degree of retinopathy and choroidopathy. However, OCT images are often acquired from various manufacturers' OCT devices, which is challenging for traditional models due to domain shift. In this paper, a novel multi-source domain adaptation framework is designed to address the challenge of segmenting fundus lesions in OCT images acquired from devices produced by different manufacturers with three core methodological innovations: (1) A multi-order moment consistency approach using moment generating function (MGF) to align feature distributions across domains. By approximating multi-order central moments using derivatives of the MGF, our method theoretically enables efficient alignment of high-order statistical features without explicit computation of polynomial expansions. (2) A perturbation-based feature consistency strategy to improve model robustness. By using segmentation and moment losses to guide perturbation generation, our method explicitly links semantic consistency with feature distribution alignment. (3) A population stability whitening technique to separate style-related and content-related features. By analyzing covariance matrix variances across perturbations, our method attempts to automatically separate style and content features. Our method is compared with several state-of-the-art approaches on two datasets, comprising diverse domains collected from various manufacturers' OCT devices. Experimental results clearly demonstrate the significant superiority of our method.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147344108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Delay-Aware Cross-Modal Knowledge Distillation for Driver Vigilance Estimation: Toward Practical Edge Deployment. 面向驾驶员警觉性估计的延迟感知跨模态知识蒸馏:面向实际边缘部署。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-02 DOI: 10.1109/JBHI.2026.3669242
Yu Sun, Shiwu Li, Tongtong Jin, Yiming Bie, Mengzhu Guo, Minghao Fu, Xin Huang

Efficient vigilance estimation in driving scenarios requires a balance between model performance and practicality. Electroencephalography (EEG), which can directly reflect brain activity, is widely used for vigilance estimation, but its acquisition process is complicated and difficult to apply to real-world driving. In contrast, physiological signals such as electrooculogram, electrodermal activity, and photoplethysmography have more advantages for practical deployment, but the information they provide is relatively limited. To address the above issues, we propose a delay-aware cross-modal knowledge distillation method. EEG signals are only used to train the teacher model. Then, an information-theoretic criterion based on mutual information and response delay is employed to determine which physiological signals are suitable as student modality for knowledge distillation from the EEG-based teacher model. On this basis, considering the inherent temporal differences caused by different physiological signals with varying sensitivities to cognitive responses, a delay-aware soft alignment mechanism (DASA) is proposed, which handles the temporal misalignment of different physiological signals and captures the asynchronous dynamics of the EEG and other physiological signals through the introduction of learnable delay and spread parameters at the patch level, to achieve soft, temporally-aligned supervision from the teacher to the student model. Finally, an objective function incorporating cross-modal consistency, patch level alignment, and smooth regularization is designed to support the effective training of the proposed cross-modal knowledge distillation method. Extensive experiments on MMV and SEED-VIG datasets validates that the proposed method outperforms existing methods in terms of estimation accuracy and temporal alignment while maintaining the real-time performance required for edge deployment.

有效的驾驶场景警惕性估计需要在模型性能和实用性之间取得平衡。脑电图(Electroencephalography, EEG)能直接反映大脑活动,被广泛用于警觉性估计,但其获取过程复杂,难以应用于现实驾驶中。相比之下,眼电图、皮肤电活动和光容积脉搏波等生理信号在实际应用中更有优势,但它们提供的信息相对有限。为了解决上述问题,我们提出了一种延迟感知的跨模态知识蒸馏方法。脑电图信号仅用于训练教师模型。然后,采用基于互信息和响应延迟的信息论准则,从基于脑电图的教师模型中确定哪些生理信号适合作为学生模态进行知识提炼。在此基础上,考虑到不同生理信号对认知反应的敏感性不同所造成的固有时间差异,提出了一种延迟感知软对准机制(delay-aware soft alignment mechanism, DASA),该机制通过引入可学习的延迟和扩散参数,处理不同生理信号的时间不对准,捕捉脑电和其他生理信号的异步动态,从而实现软的、从教师到学生的时间同步监督模式。最后,设计了一个包含跨模态一致性、补丁级对齐和平滑正则化的目标函数,以支持所提出的跨模态知识蒸馏方法的有效训练。在MMV和SEED-VIG数据集上的大量实验验证了所提出的方法在估计精度和时间对齐方面优于现有方法,同时保持边缘部署所需的实时性。
{"title":"Delay-Aware Cross-Modal Knowledge Distillation for Driver Vigilance Estimation: Toward Practical Edge Deployment.","authors":"Yu Sun, Shiwu Li, Tongtong Jin, Yiming Bie, Mengzhu Guo, Minghao Fu, Xin Huang","doi":"10.1109/JBHI.2026.3669242","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669242","url":null,"abstract":"<p><p>Efficient vigilance estimation in driving scenarios requires a balance between model performance and practicality. Electroencephalography (EEG), which can directly reflect brain activity, is widely used for vigilance estimation, but its acquisition process is complicated and difficult to apply to real-world driving. In contrast, physiological signals such as electrooculogram, electrodermal activity, and photoplethysmography have more advantages for practical deployment, but the information they provide is relatively limited. To address the above issues, we propose a delay-aware cross-modal knowledge distillation method. EEG signals are only used to train the teacher model. Then, an information-theoretic criterion based on mutual information and response delay is employed to determine which physiological signals are suitable as student modality for knowledge distillation from the EEG-based teacher model. On this basis, considering the inherent temporal differences caused by different physiological signals with varying sensitivities to cognitive responses, a delay-aware soft alignment mechanism (DASA) is proposed, which handles the temporal misalignment of different physiological signals and captures the asynchronous dynamics of the EEG and other physiological signals through the introduction of learnable delay and spread parameters at the patch level, to achieve soft, temporally-aligned supervision from the teacher to the student model. Finally, an objective function incorporating cross-modal consistency, patch level alignment, and smooth regularization is designed to support the effective training of the proposed cross-modal knowledge distillation method. Extensive experiments on MMV and SEED-VIG datasets validates that the proposed method outperforms existing methods in terms of estimation accuracy and temporal alignment while maintaining the real-time performance required for edge deployment.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147344080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal of Biomedical and Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1