首页 > 最新文献

IEEE Journal of Biomedical and Health Informatics最新文献

英文 中文
Enhancing 3D Medical Image Understanding With Pretraining Aided by 2D Multimodal Large Language Models. 基于二维多模态大语言模型的预训练增强三维医学图像理解。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-01 DOI: 10.1109/JBHI.2025.3609739
Qiuhui Chen, Xuancheng Yao, Huping Ye, Yi Hong

Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance.

理解三维医学图像在医学领域至关重要,但现有的三维医学卷积和基于变压器的自监督学习(SSL)方法往往缺乏深度的语义理解。多模态大语言模型(mllm)的最新进展为通过文本描述增强图像理解提供了一种很有前途的方法。为了利用这些2D mllm来改进3D医学图像理解,我们提出了Med3DInsight,这是一种新颖的预训练框架,通过专门设计的平面切片感知变压器模块将3D图像编码器与2D mllm集成在一起。此外,我们的模型采用了基于部分最优传输的对齐,对llm生成的内容中潜在噪声引入的噪声表现出更大的容忍度。Med3DInsight为可扩展的多模态3D医学表示学习引入了一种新的范例,无需人工注释。大量的实验证明了我们在两个下游任务上的最先进性能,即在CT和MRI模式下的各种公共数据集上的分割和分类,优于当前的SSL方法。Med3DInsight可以无缝集成到现有的3D医学图像理解网络中,潜在地提高了它们的性能。我们的源代码、生成的数据集和预训练的模型将在接受后可用。
{"title":"Enhancing 3D Medical Image Understanding With Pretraining Aided by 2D Multimodal Large Language Models.","authors":"Qiuhui Chen, Xuancheng Yao, Huping Ye, Yi Hong","doi":"10.1109/JBHI.2025.3609739","DOIUrl":"10.1109/JBHI.2025.3609739","url":null,"abstract":"<p><p>Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1506-1519"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145069448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M-TabNet: A Transformer-Based Multi-Encoder for Early Neonatal Birth Weight Prediction Using Multimodal Data. M-TabNet:一个基于变压器的多编码器,用于使用多模式数据预测早期新生儿出生体重。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-01 DOI: 10.1109/JBHI.2025.3614285
Muhammad Mursil, Hatem A Rashwan, Luis Santos-Calderon, Pere Cavalle-Busquets, Michelle M Murphy, Domenec Puig

Birth weight (BW) is a key indicator of neonatal health, and low birth weight (LBW) is linked to increased mortality and morbidity. Early prediction of BW facilitates timely prevention of impaired foetal growth. However, available techniques such as ultrasonography have limitations, including less accuracy when applied before 20 weeks of gestation and operator-dependent variability. Existing BW prediction models often neglect nutritional and genetic influences, and focus mainly on physiological and lifestyle factors. This study presents an attention-based transformer model with a multi-encoder architecture for early ($< 12$ weeks) BW prediction. Our model effectively integrates diverse maternal data, including physiological, lifestyle, nutritional, and genetic data, addressing limitations seen in previous attention-based models such as TabNet. The model achieves a Mean Absolute Error (MAE) of 122 grams and an $R^{2}$ value of 0.94, showing its high predictive accuracy and interoperability with our in-house private dataset. Independent validation confirms generalizability (MAE: 105 grams, $R^{2}$: 0.95) with the IEEE children dataset. To enhance clinical utility, predicted BW is classified into low and normal categories, achieving a sensitivity of 97.55% and a specificity of 94.48%, facilitating early risk stratification. Model interpretability is reinforced through feature importance and SHAP analysis, highlighting significant influences of maternal age, tobacco exposure, and vitamin B12 status, with genetic factors playing a secondary role. Our results emphasize the potential of advanced deep learning models to improve early BW prediction, offering a robust, interpretable, and personalized tool to identify pregnancies at risk and optimize neonatal outcomes.

出生体重(BW)是新生儿健康的一项关键指标,低出生体重(LBW)与死亡率和发病率增加有关。早期预测体重有助于及时预防胎儿生长受损。然而,现有的技术,如超声检查有局限性,包括在妊娠20周之前应用的准确性较低,以及操作员依赖的可变性。现有的体重预测模型往往忽略了营养和遗传因素的影响,而主要关注生理和生活方式因素。本研究提出了一个基于注意力的多编码器结构的变压器模型,用于早期(< 12周)体重预测。我们的模型有效地整合了多种母亲数据,包括生理、生活方式、营养和遗传数据,解决了以前基于注意力的模型(如TabNet)的局限性。该模型的平均绝对误差(MAE)为122克,$R^{2}$值为0.94,显示出其高预测精度和与我们内部私有数据集的互操作性。独立验证证实了IEEE儿童数据集的泛化性(MAE: 105克,$R^{2}$: 0.95)。为提高临床应用价值,将预测的体重分为低体重和正常体重两类,敏感性为97.55%,特异性为94.48%,便于早期风险分层。通过特征重要性和SHAP分析加强了模型的可解释性,强调了母亲年龄、烟草暴露和维生素B12状态的显著影响,遗传因素起次要作用。我们的研究结果强调了先进的深度学习模型在改善早期体重预测方面的潜力,提供了一个强大的、可解释的和个性化的工具来识别有风险的怀孕并优化新生儿结局。
{"title":"M-TabNet: A Transformer-Based Multi-Encoder for Early Neonatal Birth Weight Prediction Using Multimodal Data.","authors":"Muhammad Mursil, Hatem A Rashwan, Luis Santos-Calderon, Pere Cavalle-Busquets, Michelle M Murphy, Domenec Puig","doi":"10.1109/JBHI.2025.3614285","DOIUrl":"10.1109/JBHI.2025.3614285","url":null,"abstract":"<p><p>Birth weight (BW) is a key indicator of neonatal health, and low birth weight (LBW) is linked to increased mortality and morbidity. Early prediction of BW facilitates timely prevention of impaired foetal growth. However, available techniques such as ultrasonography have limitations, including less accuracy when applied before 20 weeks of gestation and operator-dependent variability. Existing BW prediction models often neglect nutritional and genetic influences, and focus mainly on physiological and lifestyle factors. This study presents an attention-based transformer model with a multi-encoder architecture for early ($< 12$ weeks) BW prediction. Our model effectively integrates diverse maternal data, including physiological, lifestyle, nutritional, and genetic data, addressing limitations seen in previous attention-based models such as TabNet. The model achieves a Mean Absolute Error (MAE) of 122 grams and an $R^{2}$ value of 0.94, showing its high predictive accuracy and interoperability with our in-house private dataset. Independent validation confirms generalizability (MAE: 105 grams, $R^{2}$: 0.95) with the IEEE children dataset. To enhance clinical utility, predicted BW is classified into low and normal categories, achieving a sensitivity of 97.55% and a specificity of 94.48%, facilitating early risk stratification. Model interpretability is reinforced through feature importance and SHAP analysis, highlighting significant influences of maternal age, tobacco exposure, and vitamin B12 status, with genetic factors playing a secondary role. Our results emphasize the potential of advanced deep learning models to improve early BW prediction, offering a robust, interpretable, and personalized tool to identify pregnancies at risk and optimize neonatal outcomes.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1642-1651"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145191639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tamper Detection and Self-Recovery in a Visual Secret Sharing Based Security Mechanism for Medical Records. 基于视觉秘密共享的医疗记录安全机制中的篡改检测和自我恢复。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-01 DOI: 10.1109/JBHI.2024.3424334
Ajmal Mohammed, P Samundiswary

Medical records contain highly sensitive patient information. These medical records are significant for better research, diagnosis, and treatment. However, ensuring secure medical records storage is paramount to protect patient confidentiality, integrity, and privacy. Conventional methods involve encrypting and storing medical records in third-party clouds. Such storage enables convenient access and remote consultation. This cloud storage poses single-point attack risks and may lead to erroneous diagnoses and treatment. To address this, a novel (n,n)VSS scheme is proposed with data embedding, permutation ordered binary number system, tamper detection, and self-recovery mechanism. This approach enables the reconstruction of medical records even in the case of tampering. The tamper detection algorithm ensures data integrity. Simulation results demonstrate the superiority of proposed method in terms of security and reconstruction quality. Here, security analysis is done by considering attacks such as brute force, differential, and tampering attacks. Similarly, the reconstruction quality is evaluated using various human visual system parameters. The results show that the proposed technique provides a lower bit error rate ($approx$0), high average peak signal-to-noise ratio ($approx$35 dB), high structured similarity ($approx$1), high text embedding rate ($approx$0.7 BPP), and lossless reconstruction in the case of attacks.

医疗记录包含高度敏感的病人信息。这些医疗记录对于更好地开展研究、诊断和治疗具有重要意义。然而,确保医疗记录存储的安全性对于保护患者的机密性、完整性和隐私至关重要。传统的方法是将医疗记录加密并存储在第三方云中。这种存储方式可以方便地进行访问和远程会诊。这种云存储存在单点攻击风险,可能导致错误诊断和治疗。针对这一问题,我们提出了一种新颖的 (n,n)VSS 方案,该方案具有数据嵌入、置换有序二进制数系统、篡改检测和自我恢复机制。这种方法即使在医疗记录被篡改的情况下也能重建医疗记录。篡改检测算法可确保数据完整性。仿真结果表明,所提出的方法在安全性和重建质量方面都具有优势。在这里,安全分析是通过考虑暴力、差分和篡改等攻击来完成的。同样,利用各种人类视觉系统参数对重建质量进行了评估。结果表明,所提出的技术具有较低的误码率(≈ 0)、较高的平均峰值信噪比(≈ 35 dB)、较高的结构相似性(≈ 1)、较高的文本嵌入率(≈ 0.7 BPP),以及在受到攻击时的无损重建。
{"title":"Tamper Detection and Self-Recovery in a Visual Secret Sharing Based Security Mechanism for Medical Records.","authors":"Ajmal Mohammed, P Samundiswary","doi":"10.1109/JBHI.2024.3424334","DOIUrl":"10.1109/JBHI.2024.3424334","url":null,"abstract":"<p><p>Medical records contain highly sensitive patient information. These medical records are significant for better research, diagnosis, and treatment. However, ensuring secure medical records storage is paramount to protect patient confidentiality, integrity, and privacy. Conventional methods involve encrypting and storing medical records in third-party clouds. Such storage enables convenient access and remote consultation. This cloud storage poses single-point attack risks and may lead to erroneous diagnoses and treatment. To address this, a novel (n,n)VSS scheme is proposed with data embedding, permutation ordered binary number system, tamper detection, and self-recovery mechanism. This approach enables the reconstruction of medical records even in the case of tampering. The tamper detection algorithm ensures data integrity. Simulation results demonstrate the superiority of proposed method in terms of security and reconstruction quality. Here, security analysis is done by considering attacks such as brute force, differential, and tampering attacks. Similarly, the reconstruction quality is evaluated using various human visual system parameters. The results show that the proposed technique provides a lower bit error rate ($approx$0), high average peak signal-to-noise ratio ($approx$35 dB), high structured similarity ($approx$1), high text embedding rate ($approx$0.7 BPP), and lossless reconstruction in the case of attacks.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"890-899"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141537866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedVGM: Enhancing Federated Learning Performance on Multi-Dataset Medical Images With XAI. FedVGM:用XAI增强多数据集医学图像的联邦学习性能。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-01 DOI: 10.1109/JBHI.2025.3600361
Mst Sazia Tahosin, Md Alif Sheakh, Mohammad Jahangir Alam, Md Mehedi Hassan, Anupam Kumar Bairagi, Shahab Abdulla, Samah Alshathri, Walid El-Shafai

Advances in deep learning have transformed medical imaging, yet progress is hindered by data privacy regulations and fragmented datasets across institutions. To address these challenges, we propose FedVGM, a privacy-preserving federated learning framework for multi-modal medical image analysis. FedVGM integrates four imaging modalities, including brain MRI, breast ultrasound, chest X-ray, and lung CT, across 14 diagnostic classes without centralizing patient data. Using transfer learning and an ensemble of VGG16 and MobileNetV2, FedVGM achieves 97.7% $pm$ 0.01 accuracy on the combined dataset and 91.9-99.1% across individual modalities. We evaluated three aggregation strategies and demonstrated median aggregation to be the most effective. To ensure clinical interpretability, we apply explainable AI techniques and validate results through performance metrics, statistical analysis, and k-fold cross-validation. FedVGM offers a robust, scalable solution for collaborative medical diagnostics, supporting clinical deployment while preserving data privacy.

深度学习的进步已经改变了医学成像,但数据隐私法规和机构间分散的数据集阻碍了这一进展。为了解决这些挑战,我们提出了FedVGM,一种用于多模态医学图像分析的隐私保护联邦学习框架。FedVGM集成了四种成像模式,包括脑MRI、乳腺超声、胸部x线和肺部CT,跨越14个诊断类别,而无需集中患者数据。使用迁移学习和VGG16和MobileNetV2的集成,FedVGM在组合数据集上达到97.7% $pm$ 0.01的准确率,在单个模式上达到91.9-99.1%。我们评估了三种聚合策略,并证明中位数聚合是最有效的。为了确保临床可解释性,我们应用可解释的人工智能技术,并通过性能指标、统计分析和k-fold交叉验证来验证结果。FedVGM为协同医疗诊断提供了一个强大的、可扩展的解决方案,支持临床部署,同时保护数据隐私。
{"title":"FedVGM: Enhancing Federated Learning Performance on Multi-Dataset Medical Images With XAI.","authors":"Mst Sazia Tahosin, Md Alif Sheakh, Mohammad Jahangir Alam, Md Mehedi Hassan, Anupam Kumar Bairagi, Shahab Abdulla, Samah Alshathri, Walid El-Shafai","doi":"10.1109/JBHI.2025.3600361","DOIUrl":"10.1109/JBHI.2025.3600361","url":null,"abstract":"<p><p>Advances in deep learning have transformed medical imaging, yet progress is hindered by data privacy regulations and fragmented datasets across institutions. To address these challenges, we propose FedVGM, a privacy-preserving federated learning framework for multi-modal medical image analysis. FedVGM integrates four imaging modalities, including brain MRI, breast ultrasound, chest X-ray, and lung CT, across 14 diagnostic classes without centralizing patient data. Using transfer learning and an ensemble of VGG16 and MobileNetV2, FedVGM achieves 97.7% $pm$ 0.01 accuracy on the combined dataset and 91.9-99.1% across individual modalities. We evaluated three aggregation strategies and demonstrated median aggregation to be the most effective. To ensure clinical interpretability, we apply explainable AI techniques and validate results through performance metrics, statistical analysis, and k-fold cross-validation. FedVGM offers a robust, scalable solution for collaborative medical diagnostics, supporting clinical deployment while preserving data privacy.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1272-1285"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continuous Cuffless Blood Pressure Estimation via Effective and Efficient Broad Learning Model. 基于有效和高效的广义学习模型的连续无袖带血压测量。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-01 DOI: 10.1109/JBHI.2025.3604464
Chunlin Zhang, Pingyu Hu, Zhan Shen, Xiaorong Ding

Hypertension is a critical cardiovascular risk factor, underscoring the necessity of accessible blood pressure (BP) monitoring for its prevention, detection, and management. While cuffless BP estimation using wearable cardiovascular signals via deep learning models (DLMs) offers a promising solution, their implementation often entails high computational costs. This study addresses these challenges by proposing an end-to-end broad learning model (BLM) for efficient cuffless BP estimation. Unlike DLMs that prioritize network depth, the BLM increases network width, thereby reducing computational complexity and enhancing training efficiency for continuous BP estimation. An incremental learning mode is also explored to provide high memory efficiency and flexibility. Validation on the University of California Irvine (UCI) database (403.67 hours) demonstrated that the standard BLM (SBLM) achieved a mean absolute error (MAE) of 11.72 mmHg for arterial BP (ABP) waveform estimation, performance comparable to DLMs such as long short-term memory (LSTM) and the one-dimensional convolutional neural network (1D-CNN), while improving training efficiency by 25.20 times. The incremental BLM (IBLM) offered horizontal scalability by expanding through node addition in a single layer, maintaining predictive performance while reducing storage demands through support for incremental learning with streaming or partial datasets. For systolic and diastolic BP prediction, the SBLM achieved MAEs (mean error $pm$ standard deviation) of 3.04 mmHg (2.85 $pm$ 4.15 mmHg) and 2.57 mmHg (-2.47 $pm$ 3.03 mmHg), respectively. This study highlights the potential of BLM for personalized, real-time, continuous cuffless BP monitoring, presenting a practical solution for healthcare applications.

高血压是心血管疾病和全因死亡的一个主要危险因素,因此方便易行的血压(BP)测量,如无袖带方法,对其预防、检测和管理至关重要。通过深度学习模型(DLMs)使用可穿戴心血管信号进行无袖扣BP估计提供了一个很有前途的解决方案。然而,dlm的实现通常需要很高的计算成本和时间。本研究通过提供端到端广泛学习模型(BLM)来解决这些挑战,以实现有效和高效的无套BP估计。与dlm相比,BLM增加了网络宽度而不是深度,降低了计算复杂度,提高了连续BP估计的训练效率。我们还探索了一种提供高记忆效率和灵活性的增量学习模式。在加州大学欧文分校(UCI)数据库上进行的长达403.67小时的验证表明,标准BLM (SBLM)估计动脉血压(ABP)波形的平均绝对误差(MAE)为11.72 mmHg,与长短期记忆(LSTM)和一维卷积神经网络(1D-CNN)等dlm的性能相当,同时显著提高了25.20倍的训练效率。此外,增量BLM (IBLM)提供了一种水平可伸缩性方法,它涉及通过在单个层中添加节点而不是增加层数来扩展模型,用于增量学习,有效地更新模型,同时保持可比较的预测性能。这种方法通过支持流式或部分数据集的增量学习来减少存储需求。此外,SBLM预测收缩压(SBP)和舒张压(DBP)的平均绝对误差(MAE)(平均误差(ME)±标准差(SD))值分别为3.04 mmHg(2.85±4.15 mmHg)和2.57 mmHg(-2.47±3.03 mmHg)。这项研究强调了BLM在个性化、实时和连续无袖带血压监测方面的潜力,为医疗保健应用提供了一个实用的解决方案。
{"title":"Continuous Cuffless Blood Pressure Estimation via Effective and Efficient Broad Learning Model.","authors":"Chunlin Zhang, Pingyu Hu, Zhan Shen, Xiaorong Ding","doi":"10.1109/JBHI.2025.3604464","DOIUrl":"10.1109/JBHI.2025.3604464","url":null,"abstract":"<p><p>Hypertension is a critical cardiovascular risk factor, underscoring the necessity of accessible blood pressure (BP) monitoring for its prevention, detection, and management. While cuffless BP estimation using wearable cardiovascular signals via deep learning models (DLMs) offers a promising solution, their implementation often entails high computational costs. This study addresses these challenges by proposing an end-to-end broad learning model (BLM) for efficient cuffless BP estimation. Unlike DLMs that prioritize network depth, the BLM increases network width, thereby reducing computational complexity and enhancing training efficiency for continuous BP estimation. An incremental learning mode is also explored to provide high memory efficiency and flexibility. Validation on the University of California Irvine (UCI) database (403.67 hours) demonstrated that the standard BLM (SBLM) achieved a mean absolute error (MAE) of 11.72 mmHg for arterial BP (ABP) waveform estimation, performance comparable to DLMs such as long short-term memory (LSTM) and the one-dimensional convolutional neural network (1D-CNN), while improving training efficiency by 25.20 times. The incremental BLM (IBLM) offered horizontal scalability by expanding through node addition in a single layer, maintaining predictive performance while reducing storage demands through support for incremental learning with streaming or partial datasets. For systolic and diastolic BP prediction, the SBLM achieved MAEs (mean error $pm$ standard deviation) of 3.04 mmHg (2.85 $pm$ 4.15 mmHg) and 2.57 mmHg (-2.47 $pm$ 3.03 mmHg), respectively. This study highlights the potential of BLM for personalized, real-time, continuous cuffless BP monitoring, presenting a practical solution for healthcare applications.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1101-1114"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid Deep Learning Approach for Epileptic Seizure Detection in EEG signals. 用于脑电图信号中癫痫发作检测的混合深度学习方法。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-01 DOI: 10.1109/JBHI.2023.3265983
Ijaz Ahmad, Xin Wang, Danish Javeed, Prabhat Kumar, Oluwarotimi Williams Samuel, Shixiong Chen

Early detection and proper treatment of epilepsy is essential and meaningful to those who suffer from this disease. The adoption of deep learning (DL) techniques for automated epileptic seizure detection using electroencephalography (EEG) signals has shown great potential in making the most appropriate and fast medical decisions. However, DL algorithms have high computational complexity and suffer low accuracy with imbalanced medical data in multi seizure-classification task. Motivated from the aforementioned challenges, we present a simple and effective hybrid DL approach for epileptic seizure detection in EEG signals. Specifically, first we use a K-means Synthetic minority oversampling technique (SMOTE) to balance the sampling data. Second, we integrate a 1D convolutional neural network (CNN) with a Bidirectional Long Short-Term Memory (BiLSTM) network based on Truncated Backpropagation Through Time (TBPTT) to efficiently extract spatial and temporal sequence information while reducing computational complexity. Finally, the proposed DL architecture uses softmax and sigmoid classifiers at the classification layer to perform multi and binary seizure-classification tasks. In addition, the 10-fold cross-validation technique is performed to show the significance of the proposed DL approach. Experimental results using the publicly available UCI epileptic seizure recognition data set shows better performance in terms of precision, sensitivity, specificity, and F1-score over some baseline DL algorithms and recent state-of-the-art techniques.

对于癫痫患者来说,及早发现和正确治疗癫痫是至关重要的,也是非常有意义的。采用深度学习(DL)技术利用脑电图(EEG)信号进行癫痫发作自动检测,在做出最合适、最快速的医疗决策方面显示出巨大的潜力。然而,深度学习算法具有较高的计算复杂性,而且在多癫痫发作分类任务中,不平衡医疗数据的准确性较低。鉴于上述挑战,我们提出了一种简单有效的混合 DL 方法,用于检测脑电信号中的癫痫发作。具体来说,首先,我们使用 K-means 合成少数过采样技术(SMOTE)来平衡采样数据。其次,我们整合了一维卷积神经网络(CNN)和基于时间截断反向传播(TBPTT)的双向长短期记忆(BiLSTM)网络,以有效提取空间和时间序列信息,同时降低计算复杂度。最后,拟议的 DL 架构在分类层使用 softmax 和 sigmoid 分类器来执行多重和二元癫痫发作分类任务。此外,还采用了 10 倍交叉验证技术,以显示所提出的 DL 方法的重要性。使用公开可用的 UCI 癫痫发作识别数据集进行的实验结果表明,在精确度、灵敏度、特异性和 F1 分数方面,该方法都优于一些基准 DL 算法和最新的先进技术。
{"title":"A Hybrid Deep Learning Approach for Epileptic Seizure Detection in EEG signals.","authors":"Ijaz Ahmad, Xin Wang, Danish Javeed, Prabhat Kumar, Oluwarotimi Williams Samuel, Shixiong Chen","doi":"10.1109/JBHI.2023.3265983","DOIUrl":"10.1109/JBHI.2023.3265983","url":null,"abstract":"<p><p>Early detection and proper treatment of epilepsy is essential and meaningful to those who suffer from this disease. The adoption of deep learning (DL) techniques for automated epileptic seizure detection using electroencephalography (EEG) signals has shown great potential in making the most appropriate and fast medical decisions. However, DL algorithms have high computational complexity and suffer low accuracy with imbalanced medical data in multi seizure-classification task. Motivated from the aforementioned challenges, we present a simple and effective hybrid DL approach for epileptic seizure detection in EEG signals. Specifically, first we use a K-means Synthetic minority oversampling technique (SMOTE) to balance the sampling data. Second, we integrate a 1D convolutional neural network (CNN) with a Bidirectional Long Short-Term Memory (BiLSTM) network based on Truncated Backpropagation Through Time (TBPTT) to efficiently extract spatial and temporal sequence information while reducing computational complexity. Finally, the proposed DL architecture uses softmax and sigmoid classifiers at the classification layer to perform multi and binary seizure-classification tasks. In addition, the 10-fold cross-validation technique is performed to show the significance of the proposed DL approach. Experimental results using the publicly available UCI epileptic seizure recognition data set shows better performance in terms of precision, sensitivity, specificity, and F1-score over some baseline DL algorithms and recent state-of-the-art techniques.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1019-1029"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9274448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNNViT-MILF-a: A Novel Architecture Leveraging the Synergy of CNN and ViT for Motor Imagery Classification. cnnviti - milf - A:一种利用CNN和ViT协同作用的运动图像分类新架构。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-01 DOI: 10.1109/JBHI.2025.3587026
Zhenxi Zhao, Yingyu Cao, Hongbin Yu, Huixian Yu, Junfen Huang

Accurate motor imagery (MI) classification in EEG-based brain-computer interfaces (BCIs) is essential for applications in engineering, medicine, and artificial intelligence. Due to the limitations of single-model approaches, hybrid model architectures have emerged as a promising direction. In particular, convolutional neural networks (CNNs) and vision transformers (ViTs) demonstrate strong complementary capabilities, leading to enhanced performance. This study proposes a series of novel models, termed as CNNViT-MI, to explore the synergy of CNNs and ViTs for MI classification. Specifically, five fusion strategies were defined: parallel integration, sequential integration, hierarchical integration, early fusion, and late fusion. Based on these strategies, eight candidate models were developed. Experiments were conducted on four datasets: BCI competition IV dataset 2a, BCI competition IV dataset 2b, high gamma dataset, and a self-collected MI-GS dataset. The results demonstrate that CNNViT-MILF-a achieves the best performance among all candidates by leveraging ViT as the backbone for global feature extraction and incorporating CNN-based local representations through a late fusion strategy. Compared to the best-performing state-of-the-art (SOTA) methods, mean accuracy was improved by 2.27%, 2.31%, 0.74%, and 2.50% on the respective datasets, confirming the model's effectiveness and broad applicability, other metrics showed similar improvements. In addition, significance analysis, ablation studies, and visualization analysis were conducted, and corresponding clinical integration and rehabilitation protocols were developed to support practical use in healthcare.

在基于脑电图的脑机接口(bci)中,准确的运动图像(MI)分类对于工程、医学和人工智能的应用至关重要。由于单一模型方法的局限性,混合模型体系结构已经成为一个有前途的方向。特别是卷积神经网络(cnn)和视觉变压器(ViTs)表现出很强的互补能力,从而提高了性能。本研究提出了一系列被称为CNNViT-MI的新模型,以探索cnn和vit在MI分类中的协同作用。具体而言,定义了五种融合策略:并行融合、顺序融合、分层融合、早期融合和晚期融合。基于这些策略,开发了8个候选模型。实验在4个数据集上进行:BCI competition IV dataset 2a、BCI competition IV dataset 2b、高伽马数据集和自收集的MI-GS数据集。结果表明,cnnit - milf -a利用ViT作为全局特征提取的骨干,并通过后期融合策略融合基于cnn的局部表征,在所有候选图像中取得了最佳性能。与表现最好的最先进的(SOTA)方法相比,在各自的数据集上,平均准确率提高了2.27%,2.31%,0.74%和2.50%,证实了模型的有效性和广泛适用性,其他指标也显示出类似的改进。此外,还进行了显著性分析、消融研究和可视化分析,并制定了相应的临床整合和康复方案,以支持在医疗保健中的实际应用。
{"title":"CNNViT-MILF-a: A Novel Architecture Leveraging the Synergy of CNN and ViT for Motor Imagery Classification.","authors":"Zhenxi Zhao, Yingyu Cao, Hongbin Yu, Huixian Yu, Junfen Huang","doi":"10.1109/JBHI.2025.3587026","DOIUrl":"10.1109/JBHI.2025.3587026","url":null,"abstract":"<p><p>Accurate motor imagery (MI) classification in EEG-based brain-computer interfaces (BCIs) is essential for applications in engineering, medicine, and artificial intelligence. Due to the limitations of single-model approaches, hybrid model architectures have emerged as a promising direction. In particular, convolutional neural networks (CNNs) and vision transformers (ViTs) demonstrate strong complementary capabilities, leading to enhanced performance. This study proposes a series of novel models, termed as CNNViT-MI, to explore the synergy of CNNs and ViTs for MI classification. Specifically, five fusion strategies were defined: parallel integration, sequential integration, hierarchical integration, early fusion, and late fusion. Based on these strategies, eight candidate models were developed. Experiments were conducted on four datasets: BCI competition IV dataset 2a, BCI competition IV dataset 2b, high gamma dataset, and a self-collected MI-GS dataset. The results demonstrate that CNNViT-MILF-a achieves the best performance among all candidates by leveraging ViT as the backbone for global feature extraction and incorporating CNN-based local representations through a late fusion strategy. Compared to the best-performing state-of-the-art (SOTA) methods, mean accuracy was improved by 2.27%, 2.31%, 0.74%, and 2.50% on the respective datasets, confirming the model's effectiveness and broad applicability, other metrics showed similar improvements. In addition, significance analysis, ablation studies, and visualization analysis were conducted, and corresponding clinical integration and rehabilitation protocols were developed to support practical use in healthcare.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1153-1165"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Psychological Assessments With Open-Ended Questionnaires and Large Language Models: An ASD Case Study. 运用开放式问卷和大型语言模型加强心理评估:ASD个案研究。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-01 DOI: 10.1109/JBHI.2025.3599643
Alberto Altozano, Maria Eleonora Minissi, Lucia Gomez-Zaragoza, Luna Maddalon, Mariano Alcaniz, Javier Marin-Morales

Open-ended questionnaires allow respondents to express freely, capturing richer information than close-ended formats, but they are harder to analyze. Recent natural language processing advancements enable automatic assessment of open-ended responses, yet its use in psychological classification is underexplored. This study proposes a methodology using pre-trained large language models (LLMs) for automatic classification of open-ended questionnaires, applied to autism spectrum disorder (ASD) classification via parental reports. We compare multiple training strategies using transcribed responses from 51 parents (26 with typically developing children, 25 with ASD), exploring variations in model fine-tuning, input representation, and specificity. Subject-level predictions are derived by aggregating 12 individual question responses. Our best approach achieved 84% subject-wise accuracy and 1.0 ROC-AUC using an OpenAI embedding model, per-question training, including questions in the input, and combining the predictions with a voting system. In addition, a zero-shot evaluation using GPT-4o was conducted, yielding comparable results, underscoring the potential of both compact, local models and large out-of-the-box LLMs. To enhance transparency, we explored interpretability methods. Proprietary LLMs like GPT-4o offered no direct explanation, and OpenAI embedding models showed limited interpretability. However, locally deployable LLMs provided the highest interpretability. This highlights a trade-off between proprietary models' performance and local models' explainability. Our findings validate LLMs for automatically classifying open-ended questionnaires, offering a scalable, cost-effective complement for ASD assessment. These results suggest broader applicability for psychological analysis of other conditions, advancing LLM use in mental health research.

开放式问卷允许受访者自由表达,比封闭式问卷获取更丰富的信息,但更难分析。最近自然语言处理的进步使开放式反应的自动评估成为可能,但其在心理分类中的应用尚未得到充分探索。本研究提出了一种使用预训练大语言模型(LLMs)对开放式问卷进行自动分类的方法,并将其应用于通过父母报告对自闭症谱系障碍(ASD)进行分类。我们使用51位家长(26位为正常发育儿童,25位为ASD儿童)的转录反应来比较多种训练策略,探索模型微调、输入表征和特异性的变化。学科层面的预测是通过汇总12个单独的问题回答得出的。我们最好的方法是使用OpenAI嵌入模型,每个问题的训练,包括输入中的问题,并将预测与投票系统相结合,达到84%的主题精度和1.0的ROC-AUC。此外,使用gpt - 40进行了零射击评估,得出了类似的结果,强调了紧凑的本地模型和大型开箱即用的llm的潜力。为了提高透明度,我们探索了可解释性方法。像gpt - 40这样的专有法学硕士没有提供直接的解释,OpenAI嵌入模型的可解释性有限。然而,本地可部署的llm提供了最高的可解释性。这突出了专有模型的性能和本地模型的可解释性之间的权衡。我们的研究结果验证了法学硕士对开放式问卷的自动分类,为ASD评估提供了一种可扩展的、具有成本效益的补充。这些结果表明对其他疾病的心理分析具有更广泛的适用性,促进了法学硕士在心理健康研究中的应用。
{"title":"Enhancing Psychological Assessments With Open-Ended Questionnaires and Large Language Models: An ASD Case Study.","authors":"Alberto Altozano, Maria Eleonora Minissi, Lucia Gomez-Zaragoza, Luna Maddalon, Mariano Alcaniz, Javier Marin-Morales","doi":"10.1109/JBHI.2025.3599643","DOIUrl":"10.1109/JBHI.2025.3599643","url":null,"abstract":"<p><p>Open-ended questionnaires allow respondents to express freely, capturing richer information than close-ended formats, but they are harder to analyze. Recent natural language processing advancements enable automatic assessment of open-ended responses, yet its use in psychological classification is underexplored. This study proposes a methodology using pre-trained large language models (LLMs) for automatic classification of open-ended questionnaires, applied to autism spectrum disorder (ASD) classification via parental reports. We compare multiple training strategies using transcribed responses from 51 parents (26 with typically developing children, 25 with ASD), exploring variations in model fine-tuning, input representation, and specificity. Subject-level predictions are derived by aggregating 12 individual question responses. Our best approach achieved 84% subject-wise accuracy and 1.0 ROC-AUC using an OpenAI embedding model, per-question training, including questions in the input, and combining the predictions with a voting system. In addition, a zero-shot evaluation using GPT-4o was conducted, yielding comparable results, underscoring the potential of both compact, local models and large out-of-the-box LLMs. To enhance transparency, we explored interpretability methods. Proprietary LLMs like GPT-4o offered no direct explanation, and OpenAI embedding models showed limited interpretability. However, locally deployable LLMs provided the highest interpretability. This highlights a trade-off between proprietary models' performance and local models' explainability. Our findings validate LLMs for automatically classifying open-ended questionnaires, offering a scalable, cost-effective complement for ASD assessment. These results suggest broader applicability for psychological analysis of other conditions, advancing LLM use in mental health research.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1707-1720"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144859134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Graph Learning With Multi-Hypergraph Reasoning Networks for Focal Liver Lesion Classification in Multimodal Magnetic Resonance Imaging. 基于多超图推理网络的多模态图学习在多模态磁共振成像中的肝病灶分类。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-01 DOI: 10.1109/JBHI.2025.3639185
Shaocong Mo, Ming Cai, Lanfen Lin, Ruofeng Tong, Fang Wang, Qingqing Chen, Wenbin Ji, Yinhao Li, Hongjie Hu, Yen-Wei Chen

Multimodal magnetic resonance imaging (MRI) is instrumental in differentiating liver lesions. The major challenge involves modeling reliable connections and simultaneously learning complementary information across various MRI sequences. While previous studies have primarily focused on multimodal integration in a pair-wise manner using few modalities, our research seeks to advance a more comprehensive understanding of interaction modeling by establishing complex high-order correlations among the diverse modalities in multimodal MRI. In this paper, we introduce a multimodal graph learning with multi-hypergraph reasoning network to capture the full spectrum of both pair-wise and group-wise relationships among different modalities. Specifically, a weight-shared encoder extracts features from regions of interest (ROI) images across all modalities. Subsequently, a collection of uniform hypergraphs are constructed with varying vertex configurations, allowing for the modeling of not only pair-wise correlations but also the high-order collaborations for relational reasoning. Following information propagation through the hypergraph message passing, adaptive intra-modality fusion module is proposed to effectively fuse feature representations from different hypergraphs of the same modality. Finally, all refined features are concatenated to prepare for the classification task. Our experimental evaluations, including focal liver lesions classification using the LLD-MMRI2023 dataset and early recurrence prediction of hepatocellular carcinoma using our internal datasets, demonstrate that our method significantly surpasses the performance of existing approaches, indicating the effectiveness of our model in handling both pair-wise and group-wise interactions across multiple modalities.

多模态磁共振成像(MRI)有助于鉴别肝脏病变。主要的挑战包括建立可靠的连接,同时学习不同MRI序列的互补信息。虽然以前的研究主要集中在使用少量模态以成对的方式进行多模态整合,但我们的研究旨在通过在多模态MRI中建立不同模态之间复杂的高阶相关性来推进对相互作用建模的更全面理解。在本文中,我们引入了一个多模态图学习和多超图推理网络,以捕获不同模态之间的成对和群明智关系的全谱。具体来说,权重共享编码器从所有模态的感兴趣区域(ROI)图像中提取特征。随后,用不同的顶点配置构造了一组一致的超图,不仅可以建模成对关联,还可以建模用于关系推理的高阶协作。在超图消息传递的基础上,提出自适应模态内融合模块,有效融合相同模态的不同超图的特征表示。最后,将所有细化的特征连接起来,为分类任务做准备。我们的实验评估,包括使用LLD-MMRI2023数据集进行局灶性肝病变分类,以及使用我们的内部数据集进行肝细胞癌早期复发预测,表明我们的方法显著优于现有方法的性能,表明我们的模型在处理跨多种方式的配对和组交互方面都是有效的。
{"title":"Multimodal Graph Learning With Multi-Hypergraph Reasoning Networks for Focal Liver Lesion Classification in Multimodal Magnetic Resonance Imaging.","authors":"Shaocong Mo, Ming Cai, Lanfen Lin, Ruofeng Tong, Fang Wang, Qingqing Chen, Wenbin Ji, Yinhao Li, Hongjie Hu, Yen-Wei Chen","doi":"10.1109/JBHI.2025.3639185","DOIUrl":"10.1109/JBHI.2025.3639185","url":null,"abstract":"<p><p>Multimodal magnetic resonance imaging (MRI) is instrumental in differentiating liver lesions. The major challenge involves modeling reliable connections and simultaneously learning complementary information across various MRI sequences. While previous studies have primarily focused on multimodal integration in a pair-wise manner using few modalities, our research seeks to advance a more comprehensive understanding of interaction modeling by establishing complex high-order correlations among the diverse modalities in multimodal MRI. In this paper, we introduce a multimodal graph learning with multi-hypergraph reasoning network to capture the full spectrum of both pair-wise and group-wise relationships among different modalities. Specifically, a weight-shared encoder extracts features from regions of interest (ROI) images across all modalities. Subsequently, a collection of uniform hypergraphs are constructed with varying vertex configurations, allowing for the modeling of not only pair-wise correlations but also the high-order collaborations for relational reasoning. Following information propagation through the hypergraph message passing, adaptive intra-modality fusion module is proposed to effectively fuse feature representations from different hypergraphs of the same modality. Finally, all refined features are concatenated to prepare for the classification task. Our experimental evaluations, including focal liver lesions classification using the LLD-MMRI2023 dataset and early recurrence prediction of hepatocellular carcinoma using our internal datasets, demonstrate that our method significantly surpasses the performance of existing approaches, indicating the effectiveness of our model in handling both pair-wise and group-wise interactions across multiple modalities.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1404-1417"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145654299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Remote Heart Rate Estimation Network Based on Spatial-Temporal-Channel Learning From Facial Videos. 基于人脸视频时空通道学习的鲁棒远程心率估计网络。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-02-01 DOI: 10.1109/JBHI.2025.3597997
Jun Yang, Chen Zhu, Renbiao Wu

Non-contact heart rate detection technology leverages changes in skin color to estimate heart rate, enhancing the convenience of health monitoring, particularly in situations requiring real-time, contact-free observation. However, current video-based methods face various limitations, including restricted feature extraction capabilities, redundant spatial information, and ineffective motion artifact processing. To address these problems, a novel end-to-end heart rate estimation network, Spatial-Temporal-Channel Network (STCNet), is proposed. Firstly, in order to solve the problem of redundant spatial information in current video-based heart rate estimation methods, a spatial attention learning (SAL) unit is designed to highlight the effective information of the facial region. Next, an improved temporal shift module (TSMP) with long-range temporal information perception is proposed. On this basis, A temporal-channel learning (TCL) unit is designed to achieve the interaction of information across different frames' channels, aiming to address the insufficient capability of existing models in extracting periodic features of heartbeat. Finally, combining the SAL and TCL units, a feature extraction block (FEB) is designed. A feature extraction network is constructed by stacking multiple layers of FEBs to achieve accurate heart rate estimation. Numerous experiments are conducted on the UBFC-rPPG dataset and the PURE dataset to verify the effectiveness and generalization ability of our model. Notably, compared to the state-of-the-art CIN-rPPG, our model achieves a 0.27 bpm reduction in mean absolute error (MAE) and a 0.19 bpm reduction in root mean square error (RMSE), in intra-dataset testing on the PURE dataset. Experimental results demonstrate that our proposed model outperforms other mainstream models.

非接触式心率检测技术利用皮肤颜色的变化来估计心率,增强了健康监测的便利性,特别是在需要实时、无接触观察的情况下。然而,目前基于视频的方法面临各种限制,包括限制的特征提取能力,冗余的空间信息和无效的运动伪影处理。为了解决这些问题,提出了一种新的端到端心率估计网络——时空信道网络(STCNet)。首先,为了解决当前基于视频的心率估计方法中空间信息冗余的问题,设计了空间注意学习(SAL)单元,突出显示面部区域的有效信息;在此基础上,提出了一种具有长时距信息感知的改进时移模块(TSMP)。在此基础上,设计了时序信道学习(TCL)单元,实现了不同帧信道间的信息交互,解决了现有模型在提取心跳周期特征方面能力不足的问题。最后,结合SAL和TCL单元,设计了特征提取块(FEB)。通过多层feb叠加构建特征提取网络,实现准确的心率估计。在UBFC-rPPG数据集和PURE数据集上进行了大量实验,验证了我们模型的有效性和泛化能力。值得注意的是,与最先进的CIN-rPPG相比,我们的模型在PURE数据集的数据集内测试中实现了平均绝对误差(MAE)减少0.27 bpm,均方根误差(RMSE)减少0.19 bpm。实验结果表明,该模型优于其他主流模型。
{"title":"Robust Remote Heart Rate Estimation Network Based on Spatial-Temporal-Channel Learning From Facial Videos.","authors":"Jun Yang, Chen Zhu, Renbiao Wu","doi":"10.1109/JBHI.2025.3597997","DOIUrl":"10.1109/JBHI.2025.3597997","url":null,"abstract":"<p><p>Non-contact heart rate detection technology leverages changes in skin color to estimate heart rate, enhancing the convenience of health monitoring, particularly in situations requiring real-time, contact-free observation. However, current video-based methods face various limitations, including restricted feature extraction capabilities, redundant spatial information, and ineffective motion artifact processing. To address these problems, a novel end-to-end heart rate estimation network, Spatial-Temporal-Channel Network (STCNet), is proposed. Firstly, in order to solve the problem of redundant spatial information in current video-based heart rate estimation methods, a spatial attention learning (SAL) unit is designed to highlight the effective information of the facial region. Next, an improved temporal shift module (TSMP) with long-range temporal information perception is proposed. On this basis, A temporal-channel learning (TCL) unit is designed to achieve the interaction of information across different frames' channels, aiming to address the insufficient capability of existing models in extracting periodic features of heartbeat. Finally, combining the SAL and TCL units, a feature extraction block (FEB) is designed. A feature extraction network is constructed by stacking multiple layers of FEBs to achieve accurate heart rate estimation. Numerous experiments are conducted on the UBFC-rPPG dataset and the PURE dataset to verify the effectiveness and generalization ability of our model. Notably, compared to the state-of-the-art CIN-rPPG, our model achieves a 0.27 bpm reduction in mean absolute error (MAE) and a 0.19 bpm reduction in root mean square error (RMSE), in intra-dataset testing on the PURE dataset. Experimental results demonstrate that our proposed model outperforms other mainstream models.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1258-1271"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144834977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal of Biomedical and Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1