Advances in deep learning have transformed medical imaging, yet progress is hindered by data privacy regulations and fragmented datasets across institutions. To address these challenges, we propose FedVGM, a privacy-preserving federated learning framework for multi-modal medical image analysis. FedVGM integrates four imaging modalities, including brain MRI, breast ultrasound, chest X-ray, and lung CT, across 14 diagnostic classes without centralizing patient data. Using transfer learning and an ensemble of VGG16 and MobileNetV2, FedVGM achieves 97.7% $pm$ 0.01 accuracy on the combined dataset and 91.9-99.1% across individual modalities. We evaluated three aggregation strategies and demonstrated median aggregation to be the most effective. To ensure clinical interpretability, we apply explainable AI techniques and validate results through performance metrics, statistical analysis, and k-fold cross-validation. FedVGM offers a robust, scalable solution for collaborative medical diagnostics, supporting clinical deployment while preserving data privacy.
{"title":"FedVGM: Enhancing Federated Learning Performance on Multi-Dataset Medical Images With XAI.","authors":"Mst Sazia Tahosin, Md Alif Sheakh, Mohammad Jahangir Alam, Md Mehedi Hassan, Anupam Kumar Bairagi, Shahab Abdulla, Samah Alshathri, Walid El-Shafai","doi":"10.1109/JBHI.2025.3600361","DOIUrl":"10.1109/JBHI.2025.3600361","url":null,"abstract":"<p><p>Advances in deep learning have transformed medical imaging, yet progress is hindered by data privacy regulations and fragmented datasets across institutions. To address these challenges, we propose FedVGM, a privacy-preserving federated learning framework for multi-modal medical image analysis. FedVGM integrates four imaging modalities, including brain MRI, breast ultrasound, chest X-ray, and lung CT, across 14 diagnostic classes without centralizing patient data. Using transfer learning and an ensemble of VGG16 and MobileNetV2, FedVGM achieves 97.7% $pm$ 0.01 accuracy on the combined dataset and 91.9-99.1% across individual modalities. We evaluated three aggregation strategies and demonstrated median aggregation to be the most effective. To ensure clinical interpretability, we apply explainable AI techniques and validate results through performance metrics, statistical analysis, and k-fold cross-validation. FedVGM offers a robust, scalable solution for collaborative medical diagnostics, supporting clinical deployment while preserving data privacy.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1272-1285"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hypertension is a critical cardiovascular risk factor, underscoring the necessity of accessible blood pressure (BP) monitoring for its prevention, detection, and management. While cuffless BP estimation using wearable cardiovascular signals via deep learning models (DLMs) offers a promising solution, their implementation often entails high computational costs. This study addresses these challenges by proposing an end-to-end broad learning model (BLM) for efficient cuffless BP estimation. Unlike DLMs that prioritize network depth, the BLM increases network width, thereby reducing computational complexity and enhancing training efficiency for continuous BP estimation. An incremental learning mode is also explored to provide high memory efficiency and flexibility. Validation on the University of California Irvine (UCI) database (403.67 hours) demonstrated that the standard BLM (SBLM) achieved a mean absolute error (MAE) of 11.72 mmHg for arterial BP (ABP) waveform estimation, performance comparable to DLMs such as long short-term memory (LSTM) and the one-dimensional convolutional neural network (1D-CNN), while improving training efficiency by 25.20 times. The incremental BLM (IBLM) offered horizontal scalability by expanding through node addition in a single layer, maintaining predictive performance while reducing storage demands through support for incremental learning with streaming or partial datasets. For systolic and diastolic BP prediction, the SBLM achieved MAEs (mean error $pm$ standard deviation) of 3.04 mmHg (2.85 $pm$ 4.15 mmHg) and 2.57 mmHg (-2.47 $pm$ 3.03 mmHg), respectively. This study highlights the potential of BLM for personalized, real-time, continuous cuffless BP monitoring, presenting a practical solution for healthcare applications.
{"title":"Continuous Cuffless Blood Pressure Estimation via Effective and Efficient Broad Learning Model.","authors":"Chunlin Zhang, Pingyu Hu, Zhan Shen, Xiaorong Ding","doi":"10.1109/JBHI.2025.3604464","DOIUrl":"10.1109/JBHI.2025.3604464","url":null,"abstract":"<p><p>Hypertension is a critical cardiovascular risk factor, underscoring the necessity of accessible blood pressure (BP) monitoring for its prevention, detection, and management. While cuffless BP estimation using wearable cardiovascular signals via deep learning models (DLMs) offers a promising solution, their implementation often entails high computational costs. This study addresses these challenges by proposing an end-to-end broad learning model (BLM) for efficient cuffless BP estimation. Unlike DLMs that prioritize network depth, the BLM increases network width, thereby reducing computational complexity and enhancing training efficiency for continuous BP estimation. An incremental learning mode is also explored to provide high memory efficiency and flexibility. Validation on the University of California Irvine (UCI) database (403.67 hours) demonstrated that the standard BLM (SBLM) achieved a mean absolute error (MAE) of 11.72 mmHg for arterial BP (ABP) waveform estimation, performance comparable to DLMs such as long short-term memory (LSTM) and the one-dimensional convolutional neural network (1D-CNN), while improving training efficiency by 25.20 times. The incremental BLM (IBLM) offered horizontal scalability by expanding through node addition in a single layer, maintaining predictive performance while reducing storage demands through support for incremental learning with streaming or partial datasets. For systolic and diastolic BP prediction, the SBLM achieved MAEs (mean error $pm$ standard deviation) of 3.04 mmHg (2.85 $pm$ 4.15 mmHg) and 2.57 mmHg (-2.47 $pm$ 3.03 mmHg), respectively. This study highlights the potential of BLM for personalized, real-time, continuous cuffless BP monitoring, presenting a practical solution for healthcare applications.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1101-1114"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early detection and proper treatment of epilepsy is essential and meaningful to those who suffer from this disease. The adoption of deep learning (DL) techniques for automated epileptic seizure detection using electroencephalography (EEG) signals has shown great potential in making the most appropriate and fast medical decisions. However, DL algorithms have high computational complexity and suffer low accuracy with imbalanced medical data in multi seizure-classification task. Motivated from the aforementioned challenges, we present a simple and effective hybrid DL approach for epileptic seizure detection in EEG signals. Specifically, first we use a K-means Synthetic minority oversampling technique (SMOTE) to balance the sampling data. Second, we integrate a 1D convolutional neural network (CNN) with a Bidirectional Long Short-Term Memory (BiLSTM) network based on Truncated Backpropagation Through Time (TBPTT) to efficiently extract spatial and temporal sequence information while reducing computational complexity. Finally, the proposed DL architecture uses softmax and sigmoid classifiers at the classification layer to perform multi and binary seizure-classification tasks. In addition, the 10-fold cross-validation technique is performed to show the significance of the proposed DL approach. Experimental results using the publicly available UCI epileptic seizure recognition data set shows better performance in terms of precision, sensitivity, specificity, and F1-score over some baseline DL algorithms and recent state-of-the-art techniques.
{"title":"A Hybrid Deep Learning Approach for Epileptic Seizure Detection in EEG signals.","authors":"Ijaz Ahmad, Xin Wang, Danish Javeed, Prabhat Kumar, Oluwarotimi Williams Samuel, Shixiong Chen","doi":"10.1109/JBHI.2023.3265983","DOIUrl":"10.1109/JBHI.2023.3265983","url":null,"abstract":"<p><p>Early detection and proper treatment of epilepsy is essential and meaningful to those who suffer from this disease. The adoption of deep learning (DL) techniques for automated epileptic seizure detection using electroencephalography (EEG) signals has shown great potential in making the most appropriate and fast medical decisions. However, DL algorithms have high computational complexity and suffer low accuracy with imbalanced medical data in multi seizure-classification task. Motivated from the aforementioned challenges, we present a simple and effective hybrid DL approach for epileptic seizure detection in EEG signals. Specifically, first we use a K-means Synthetic minority oversampling technique (SMOTE) to balance the sampling data. Second, we integrate a 1D convolutional neural network (CNN) with a Bidirectional Long Short-Term Memory (BiLSTM) network based on Truncated Backpropagation Through Time (TBPTT) to efficiently extract spatial and temporal sequence information while reducing computational complexity. Finally, the proposed DL architecture uses softmax and sigmoid classifiers at the classification layer to perform multi and binary seizure-classification tasks. In addition, the 10-fold cross-validation technique is performed to show the significance of the proposed DL approach. Experimental results using the publicly available UCI epileptic seizure recognition data set shows better performance in terms of precision, sensitivity, specificity, and F1-score over some baseline DL algorithms and recent state-of-the-art techniques.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1019-1029"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9274448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate motor imagery (MI) classification in EEG-based brain-computer interfaces (BCIs) is essential for applications in engineering, medicine, and artificial intelligence. Due to the limitations of single-model approaches, hybrid model architectures have emerged as a promising direction. In particular, convolutional neural networks (CNNs) and vision transformers (ViTs) demonstrate strong complementary capabilities, leading to enhanced performance. This study proposes a series of novel models, termed as CNNViT-MI, to explore the synergy of CNNs and ViTs for MI classification. Specifically, five fusion strategies were defined: parallel integration, sequential integration, hierarchical integration, early fusion, and late fusion. Based on these strategies, eight candidate models were developed. Experiments were conducted on four datasets: BCI competition IV dataset 2a, BCI competition IV dataset 2b, high gamma dataset, and a self-collected MI-GS dataset. The results demonstrate that CNNViT-MILF-a achieves the best performance among all candidates by leveraging ViT as the backbone for global feature extraction and incorporating CNN-based local representations through a late fusion strategy. Compared to the best-performing state-of-the-art (SOTA) methods, mean accuracy was improved by 2.27%, 2.31%, 0.74%, and 2.50% on the respective datasets, confirming the model's effectiveness and broad applicability, other metrics showed similar improvements. In addition, significance analysis, ablation studies, and visualization analysis were conducted, and corresponding clinical integration and rehabilitation protocols were developed to support practical use in healthcare.
在基于脑电图的脑机接口(bci)中,准确的运动图像(MI)分类对于工程、医学和人工智能的应用至关重要。由于单一模型方法的局限性,混合模型体系结构已经成为一个有前途的方向。特别是卷积神经网络(cnn)和视觉变压器(ViTs)表现出很强的互补能力,从而提高了性能。本研究提出了一系列被称为CNNViT-MI的新模型,以探索cnn和vit在MI分类中的协同作用。具体而言,定义了五种融合策略:并行融合、顺序融合、分层融合、早期融合和晚期融合。基于这些策略,开发了8个候选模型。实验在4个数据集上进行:BCI competition IV dataset 2a、BCI competition IV dataset 2b、高伽马数据集和自收集的MI-GS数据集。结果表明,cnnit - milf -a利用ViT作为全局特征提取的骨干,并通过后期融合策略融合基于cnn的局部表征,在所有候选图像中取得了最佳性能。与表现最好的最先进的(SOTA)方法相比,在各自的数据集上,平均准确率提高了2.27%,2.31%,0.74%和2.50%,证实了模型的有效性和广泛适用性,其他指标也显示出类似的改进。此外,还进行了显著性分析、消融研究和可视化分析,并制定了相应的临床整合和康复方案,以支持在医疗保健中的实际应用。
{"title":"CNNViT-MILF-a: A Novel Architecture Leveraging the Synergy of CNN and ViT for Motor Imagery Classification.","authors":"Zhenxi Zhao, Yingyu Cao, Hongbin Yu, Huixian Yu, Junfen Huang","doi":"10.1109/JBHI.2025.3587026","DOIUrl":"10.1109/JBHI.2025.3587026","url":null,"abstract":"<p><p>Accurate motor imagery (MI) classification in EEG-based brain-computer interfaces (BCIs) is essential for applications in engineering, medicine, and artificial intelligence. Due to the limitations of single-model approaches, hybrid model architectures have emerged as a promising direction. In particular, convolutional neural networks (CNNs) and vision transformers (ViTs) demonstrate strong complementary capabilities, leading to enhanced performance. This study proposes a series of novel models, termed as CNNViT-MI, to explore the synergy of CNNs and ViTs for MI classification. Specifically, five fusion strategies were defined: parallel integration, sequential integration, hierarchical integration, early fusion, and late fusion. Based on these strategies, eight candidate models were developed. Experiments were conducted on four datasets: BCI competition IV dataset 2a, BCI competition IV dataset 2b, high gamma dataset, and a self-collected MI-GS dataset. The results demonstrate that CNNViT-MILF-a achieves the best performance among all candidates by leveraging ViT as the backbone for global feature extraction and incorporating CNN-based local representations through a late fusion strategy. Compared to the best-performing state-of-the-art (SOTA) methods, mean accuracy was improved by 2.27%, 2.31%, 0.74%, and 2.50% on the respective datasets, confirming the model's effectiveness and broad applicability, other metrics showed similar improvements. In addition, significance analysis, ablation studies, and visualization analysis were conducted, and corresponding clinical integration and rehabilitation protocols were developed to support practical use in healthcare.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1153-1165"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3599643
Alberto Altozano, Maria Eleonora Minissi, Lucia Gomez-Zaragoza, Luna Maddalon, Mariano Alcaniz, Javier Marin-Morales
Open-ended questionnaires allow respondents to express freely, capturing richer information than close-ended formats, but they are harder to analyze. Recent natural language processing advancements enable automatic assessment of open-ended responses, yet its use in psychological classification is underexplored. This study proposes a methodology using pre-trained large language models (LLMs) for automatic classification of open-ended questionnaires, applied to autism spectrum disorder (ASD) classification via parental reports. We compare multiple training strategies using transcribed responses from 51 parents (26 with typically developing children, 25 with ASD), exploring variations in model fine-tuning, input representation, and specificity. Subject-level predictions are derived by aggregating 12 individual question responses. Our best approach achieved 84% subject-wise accuracy and 1.0 ROC-AUC using an OpenAI embedding model, per-question training, including questions in the input, and combining the predictions with a voting system. In addition, a zero-shot evaluation using GPT-4o was conducted, yielding comparable results, underscoring the potential of both compact, local models and large out-of-the-box LLMs. To enhance transparency, we explored interpretability methods. Proprietary LLMs like GPT-4o offered no direct explanation, and OpenAI embedding models showed limited interpretability. However, locally deployable LLMs provided the highest interpretability. This highlights a trade-off between proprietary models' performance and local models' explainability. Our findings validate LLMs for automatically classifying open-ended questionnaires, offering a scalable, cost-effective complement for ASD assessment. These results suggest broader applicability for psychological analysis of other conditions, advancing LLM use in mental health research.
{"title":"Enhancing Psychological Assessments With Open-Ended Questionnaires and Large Language Models: An ASD Case Study.","authors":"Alberto Altozano, Maria Eleonora Minissi, Lucia Gomez-Zaragoza, Luna Maddalon, Mariano Alcaniz, Javier Marin-Morales","doi":"10.1109/JBHI.2025.3599643","DOIUrl":"10.1109/JBHI.2025.3599643","url":null,"abstract":"<p><p>Open-ended questionnaires allow respondents to express freely, capturing richer information than close-ended formats, but they are harder to analyze. Recent natural language processing advancements enable automatic assessment of open-ended responses, yet its use in psychological classification is underexplored. This study proposes a methodology using pre-trained large language models (LLMs) for automatic classification of open-ended questionnaires, applied to autism spectrum disorder (ASD) classification via parental reports. We compare multiple training strategies using transcribed responses from 51 parents (26 with typically developing children, 25 with ASD), exploring variations in model fine-tuning, input representation, and specificity. Subject-level predictions are derived by aggregating 12 individual question responses. Our best approach achieved 84% subject-wise accuracy and 1.0 ROC-AUC using an OpenAI embedding model, per-question training, including questions in the input, and combining the predictions with a voting system. In addition, a zero-shot evaluation using GPT-4o was conducted, yielding comparable results, underscoring the potential of both compact, local models and large out-of-the-box LLMs. To enhance transparency, we explored interpretability methods. Proprietary LLMs like GPT-4o offered no direct explanation, and OpenAI embedding models showed limited interpretability. However, locally deployable LLMs provided the highest interpretability. This highlights a trade-off between proprietary models' performance and local models' explainability. Our findings validate LLMs for automatically classifying open-ended questionnaires, offering a scalable, cost-effective complement for ASD assessment. These results suggest broader applicability for psychological analysis of other conditions, advancing LLM use in mental health research.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1707-1720"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144859134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3639185
Shaocong Mo, Ming Cai, Lanfen Lin, Ruofeng Tong, Fang Wang, Qingqing Chen, Wenbin Ji, Yinhao Li, Hongjie Hu, Yen-Wei Chen
Multimodal magnetic resonance imaging (MRI) is instrumental in differentiating liver lesions. The major challenge involves modeling reliable connections and simultaneously learning complementary information across various MRI sequences. While previous studies have primarily focused on multimodal integration in a pair-wise manner using few modalities, our research seeks to advance a more comprehensive understanding of interaction modeling by establishing complex high-order correlations among the diverse modalities in multimodal MRI. In this paper, we introduce a multimodal graph learning with multi-hypergraph reasoning network to capture the full spectrum of both pair-wise and group-wise relationships among different modalities. Specifically, a weight-shared encoder extracts features from regions of interest (ROI) images across all modalities. Subsequently, a collection of uniform hypergraphs are constructed with varying vertex configurations, allowing for the modeling of not only pair-wise correlations but also the high-order collaborations for relational reasoning. Following information propagation through the hypergraph message passing, adaptive intra-modality fusion module is proposed to effectively fuse feature representations from different hypergraphs of the same modality. Finally, all refined features are concatenated to prepare for the classification task. Our experimental evaluations, including focal liver lesions classification using the LLD-MMRI2023 dataset and early recurrence prediction of hepatocellular carcinoma using our internal datasets, demonstrate that our method significantly surpasses the performance of existing approaches, indicating the effectiveness of our model in handling both pair-wise and group-wise interactions across multiple modalities.
{"title":"Multimodal Graph Learning With Multi-Hypergraph Reasoning Networks for Focal Liver Lesion Classification in Multimodal Magnetic Resonance Imaging.","authors":"Shaocong Mo, Ming Cai, Lanfen Lin, Ruofeng Tong, Fang Wang, Qingqing Chen, Wenbin Ji, Yinhao Li, Hongjie Hu, Yen-Wei Chen","doi":"10.1109/JBHI.2025.3639185","DOIUrl":"10.1109/JBHI.2025.3639185","url":null,"abstract":"<p><p>Multimodal magnetic resonance imaging (MRI) is instrumental in differentiating liver lesions. The major challenge involves modeling reliable connections and simultaneously learning complementary information across various MRI sequences. While previous studies have primarily focused on multimodal integration in a pair-wise manner using few modalities, our research seeks to advance a more comprehensive understanding of interaction modeling by establishing complex high-order correlations among the diverse modalities in multimodal MRI. In this paper, we introduce a multimodal graph learning with multi-hypergraph reasoning network to capture the full spectrum of both pair-wise and group-wise relationships among different modalities. Specifically, a weight-shared encoder extracts features from regions of interest (ROI) images across all modalities. Subsequently, a collection of uniform hypergraphs are constructed with varying vertex configurations, allowing for the modeling of not only pair-wise correlations but also the high-order collaborations for relational reasoning. Following information propagation through the hypergraph message passing, adaptive intra-modality fusion module is proposed to effectively fuse feature representations from different hypergraphs of the same modality. Finally, all refined features are concatenated to prepare for the classification task. Our experimental evaluations, including focal liver lesions classification using the LLD-MMRI2023 dataset and early recurrence prediction of hepatocellular carcinoma using our internal datasets, demonstrate that our method significantly surpasses the performance of existing approaches, indicating the effectiveness of our model in handling both pair-wise and group-wise interactions across multiple modalities.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1404-1417"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145654299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3597997
Jun Yang, Chen Zhu, Renbiao Wu
Non-contact heart rate detection technology leverages changes in skin color to estimate heart rate, enhancing the convenience of health monitoring, particularly in situations requiring real-time, contact-free observation. However, current video-based methods face various limitations, including restricted feature extraction capabilities, redundant spatial information, and ineffective motion artifact processing. To address these problems, a novel end-to-end heart rate estimation network, Spatial-Temporal-Channel Network (STCNet), is proposed. Firstly, in order to solve the problem of redundant spatial information in current video-based heart rate estimation methods, a spatial attention learning (SAL) unit is designed to highlight the effective information of the facial region. Next, an improved temporal shift module (TSMP) with long-range temporal information perception is proposed. On this basis, A temporal-channel learning (TCL) unit is designed to achieve the interaction of information across different frames' channels, aiming to address the insufficient capability of existing models in extracting periodic features of heartbeat. Finally, combining the SAL and TCL units, a feature extraction block (FEB) is designed. A feature extraction network is constructed by stacking multiple layers of FEBs to achieve accurate heart rate estimation. Numerous experiments are conducted on the UBFC-rPPG dataset and the PURE dataset to verify the effectiveness and generalization ability of our model. Notably, compared to the state-of-the-art CIN-rPPG, our model achieves a 0.27 bpm reduction in mean absolute error (MAE) and a 0.19 bpm reduction in root mean square error (RMSE), in intra-dataset testing on the PURE dataset. Experimental results demonstrate that our proposed model outperforms other mainstream models.
{"title":"Robust Remote Heart Rate Estimation Network Based on Spatial-Temporal-Channel Learning From Facial Videos.","authors":"Jun Yang, Chen Zhu, Renbiao Wu","doi":"10.1109/JBHI.2025.3597997","DOIUrl":"10.1109/JBHI.2025.3597997","url":null,"abstract":"<p><p>Non-contact heart rate detection technology leverages changes in skin color to estimate heart rate, enhancing the convenience of health monitoring, particularly in situations requiring real-time, contact-free observation. However, current video-based methods face various limitations, including restricted feature extraction capabilities, redundant spatial information, and ineffective motion artifact processing. To address these problems, a novel end-to-end heart rate estimation network, Spatial-Temporal-Channel Network (STCNet), is proposed. Firstly, in order to solve the problem of redundant spatial information in current video-based heart rate estimation methods, a spatial attention learning (SAL) unit is designed to highlight the effective information of the facial region. Next, an improved temporal shift module (TSMP) with long-range temporal information perception is proposed. On this basis, A temporal-channel learning (TCL) unit is designed to achieve the interaction of information across different frames' channels, aiming to address the insufficient capability of existing models in extracting periodic features of heartbeat. Finally, combining the SAL and TCL units, a feature extraction block (FEB) is designed. A feature extraction network is constructed by stacking multiple layers of FEBs to achieve accurate heart rate estimation. Numerous experiments are conducted on the UBFC-rPPG dataset and the PURE dataset to verify the effectiveness and generalization ability of our model. Notably, compared to the state-of-the-art CIN-rPPG, our model achieves a 0.27 bpm reduction in mean absolute error (MAE) and a 0.19 bpm reduction in root mean square error (RMSE), in intra-dataset testing on the PURE dataset. Experimental results demonstrate that our proposed model outperforms other mainstream models.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1258-1271"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144834977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3610111
Jiaxing Xu, Mengcheng Lan, Xia Dong, Kai He, Wei Zhang, Qingtian Bian, Yiping Ke
Brain network analysis plays a crucial role in identifying distinctive patterns associated with neurological disorders. Functional magnetic resonance imaging (fMRI) enables the construction of brain networks by analyzing correlations in blood-oxygen-level-dependent (BOLD) signals across different brain regions, known as regions of interest (ROIs). These networks are typically constructed using atlases that parcellate the brain based on various hypotheses of functional and anatomical divisions. However, there is no standard atlas for brain network classification, leading to limitations in detecting abnormalities in disorders. Recent methods leveraging multiple atlases fail to ensure consistency across atlases and lack effective ROI-level information exchange, limiting their efficacy. To address these challenges, we propose the Atlas-Integrated Distillation and Fusion network (AIDFusion), a novel framework designed to enhance brain network classification using fMRI data. AIDFusion introduces a disentangle Transformer to filter out inconsistent atlas-specific information and distill meaningful cross-atlas connections. Additionally, it enforces subject- and population-level consistency constraints to improve cross-atlas coherence. To further enhance feature integration, AIDFusion incorporates an inter-atlas message-passing mechanism that facilitates the fusion of complementary information across brain regions. We evaluate AIDFusion on four resting-state fMRI datasets encompassing different neurological disorders. Experimental results demonstrate its superior classification performance and computational efficiency compared to state-of-the-art methods. Furthermore, a case study highlights AIDFusion's ability to extract interpretable patterns that align with established neuroscience findings, reinforcing its potential as a robust tool for multi-atlas brain network analysis.
脑网络分析在识别与神经系统疾病相关的独特模式方面起着至关重要的作用。功能性磁共振成像(fMRI)通过分析不同大脑区域(即感兴趣区域(roi))中血氧水平依赖(BOLD)信号的相关性来构建大脑网络。这些网络通常是用地图集构建的,地图集根据各种功能和解剖划分的假设将大脑包裹起来。然而,没有标准的脑网络分类图谱,导致在检测异常的障碍限制。最近利用多个地图集的方法无法确保地图集之间的一致性,并且缺乏有效的roi级信息交换,从而限制了它们的有效性。为了应对这些挑战,我们提出了Atlas-Integrated Distillation and Fusion network (AIDFusion),这是一种利用fMRI数据增强脑网络分类的新框架。AIDFusion引入了一个解缠绕变压器,过滤掉不一致的图谱特定信息,提取出有意义的跨图谱连接。此外,它加强了学科和人口水平的一致性约束,以提高跨图谱的一致性。为了进一步增强特征整合,AIDFusion结合了一个图谱间信息传递机制,促进了脑区域间互补信息的融合。我们在包含不同神经系统疾病的四个静息状态fMRI数据集上评估AIDFusion。实验结果表明,该方法具有较好的分类性能和计算效率。此外,一个案例研究强调了AIDFusion提取与已建立的神经科学发现相一致的可解释模式的能力,加强了其作为多图谱脑网络分析的强大工具的潜力。该代码可在https://github.com/AngusMonroe/AIDFusion上公开获得。
{"title":"Multi-Atlas Brain Network Classification Through Consistency Distillation and Complementary Information Fusion.","authors":"Jiaxing Xu, Mengcheng Lan, Xia Dong, Kai He, Wei Zhang, Qingtian Bian, Yiping Ke","doi":"10.1109/JBHI.2025.3610111","DOIUrl":"10.1109/JBHI.2025.3610111","url":null,"abstract":"<p><p>Brain network analysis plays a crucial role in identifying distinctive patterns associated with neurological disorders. Functional magnetic resonance imaging (fMRI) enables the construction of brain networks by analyzing correlations in blood-oxygen-level-dependent (BOLD) signals across different brain regions, known as regions of interest (ROIs). These networks are typically constructed using atlases that parcellate the brain based on various hypotheses of functional and anatomical divisions. However, there is no standard atlas for brain network classification, leading to limitations in detecting abnormalities in disorders. Recent methods leveraging multiple atlases fail to ensure consistency across atlases and lack effective ROI-level information exchange, limiting their efficacy. To address these challenges, we propose the Atlas-Integrated Distillation and Fusion network (AIDFusion), a novel framework designed to enhance brain network classification using fMRI data. AIDFusion introduces a disentangle Transformer to filter out inconsistent atlas-specific information and distill meaningful cross-atlas connections. Additionally, it enforces subject- and population-level consistency constraints to improve cross-atlas coherence. To further enhance feature integration, AIDFusion incorporates an inter-atlas message-passing mechanism that facilitates the fusion of complementary information across brain regions. We evaluate AIDFusion on four resting-state fMRI datasets encompassing different neurological disorders. Experimental results demonstrate its superior classification performance and computational efficiency compared to state-of-the-art methods. Furthermore, a case study highlights AIDFusion's ability to extract interpretable patterns that align with established neuroscience findings, reinforcing its potential as a robust tool for multi-atlas brain network analysis.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1568-1579"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145075230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3649496
A S Panayides, H Chen, N D Filipovic, T Geroski, J Hou, K Lekadir, K Marias, G K Matsopoulos, G Papanastasiou, P Sarder, G Tourassi, S A Tsaftaris, H Fu, E Kyriacou, C P Loizou, M Zervakis, J H Saltz, F E Shamout, K C L Wong, J Yao, A Amini, D I Fotiadis, C S Pattichis, M S Pattichis
Over the past five years, artificial intelligence (AI) has introduced new models and methods for addressing the challenges associated with the broader adoption of AI models and systems in medicine. This paper reviews recent advances in AI for medical image and video analysis, outlines emerging paradigms, highlights pathways for successful clinical translation, and provides recommendations for future work. Hybrid Convolutional Neural Network (CNN) Transformer architectures now deliver state-of-the-art results in segmentation, classification, reconstruction, synthesis, and registration. Foundation and generative AI models enable the use of transfer learning to smaller datasets with limited ground truth. Federated learning supports privacy-preserving collaboration across institutions. Explainable and trustworthy AI approaches have become essential to foster clinician trust, ensure regulatory compliance, and facilitate ethical deployment. Together, these developments pave the way for integrating AI into radiology, pathology, and wider healthcare workflows.
{"title":"Position Paper: Artificial Intelligence in Medical Image Analysis: Advances, Clinical Translation, and Emerging Frontiers.","authors":"A S Panayides, H Chen, N D Filipovic, T Geroski, J Hou, K Lekadir, K Marias, G K Matsopoulos, G Papanastasiou, P Sarder, G Tourassi, S A Tsaftaris, H Fu, E Kyriacou, C P Loizou, M Zervakis, J H Saltz, F E Shamout, K C L Wong, J Yao, A Amini, D I Fotiadis, C S Pattichis, M S Pattichis","doi":"10.1109/JBHI.2025.3649496","DOIUrl":"10.1109/JBHI.2025.3649496","url":null,"abstract":"<p><p>Over the past five years, artificial intelligence (AI) has introduced new models and methods for addressing the challenges associated with the broader adoption of AI models and systems in medicine. This paper reviews recent advances in AI for medical image and video analysis, outlines emerging paradigms, highlights pathways for successful clinical translation, and provides recommendations for future work. Hybrid Convolutional Neural Network (CNN) Transformer architectures now deliver state-of-the-art results in segmentation, classification, reconstruction, synthesis, and registration. Foundation and generative AI models enable the use of transfer learning to smaller datasets with limited ground truth. Federated learning supports privacy-preserving collaboration across institutions. Explainable and trustworthy AI approaches have become essential to foster clinician trust, ensure regulatory compliance, and facilitate ethical deployment. Together, these developments pave the way for integrating AI into radiology, pathology, and wider healthcare workflows.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1187-1202"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145878178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2024.3373439
Huayu Li, Ana S Carreon-Rascon, Xiwen Chen, Geng Yuan, Ao Li
Medical time series data are indispensable in healthcare, providing critical insights for disease diagnosis, treatment planning, and patient management. The exponential growth in data complexity, driven by advanced sensor technologies, has presented challenges related to data labeling. Self-supervised learning (SSL) has emerged as a transformative approach to address these challenges, eliminating the need for extensive human annotation. In this study, we introduce a novel framework for Medical Time Series Representation Learning, known as MTS-LOF. MTS-LOF leverages the strengths of Joint-Embedding SSL and Masked Autoencoder (MAE) methods, offering a unique approach to representation learning for medical time series data. By combining these techniques, MTS-LOF enhances the potential of healthcare applications by providing more sophisticated, context-rich representations. Additionally, MTS-LOF employs a multi-masking strategy to facilitate occlusion-invariant feature learning. This approach allows the model to create multiple views of the data by masking portions of it. By minimizing the discrepancy between the representations of these masked patches and the fully visible patches, MTS-LOF learns to capture rich contextual information within medical time series datasets. The results of experiments conducted on diverse medical time series datasets demonstrate the superiority of MTS-LOF over other methods. These findings hold promise for significantly enhancing healthcare applications by improving representation learning. Furthermore, our work delves into the integration of Joint-Embedding SSL and MAE techniques, shedding light on the intricate interplay between temporal and structural dependencies in healthcare data. This understanding is crucial, as it allows us to grasp the complexities of healthcare data analysis.
{"title":"MTS-LOF: Medical Time-Series Representation Learning via Occlusion-Invariant Features.","authors":"Huayu Li, Ana S Carreon-Rascon, Xiwen Chen, Geng Yuan, Ao Li","doi":"10.1109/JBHI.2024.3373439","DOIUrl":"10.1109/JBHI.2024.3373439","url":null,"abstract":"<p><p>Medical time series data are indispensable in healthcare, providing critical insights for disease diagnosis, treatment planning, and patient management. The exponential growth in data complexity, driven by advanced sensor technologies, has presented challenges related to data labeling. Self-supervised learning (SSL) has emerged as a transformative approach to address these challenges, eliminating the need for extensive human annotation. In this study, we introduce a novel framework for Medical Time Series Representation Learning, known as MTS-LOF. MTS-LOF leverages the strengths of Joint-Embedding SSL and Masked Autoencoder (MAE) methods, offering a unique approach to representation learning for medical time series data. By combining these techniques, MTS-LOF enhances the potential of healthcare applications by providing more sophisticated, context-rich representations. Additionally, MTS-LOF employs a multi-masking strategy to facilitate occlusion-invariant feature learning. This approach allows the model to create multiple views of the data by masking portions of it. By minimizing the discrepancy between the representations of these masked patches and the fully visible patches, MTS-LOF learns to capture rich contextual information within medical time series datasets. The results of experiments conducted on diverse medical time series datasets demonstrate the superiority of MTS-LOF over other methods. These findings hold promise for significantly enhancing healthcare applications by improving representation learning. Furthermore, our work delves into the integration of Joint-Embedding SSL and MAE techniques, shedding light on the intricate interplay between temporal and structural dependencies in healthcare data. This understanding is crucial, as it allows us to grasp the complexities of healthcare data analysis.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"958-969"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140039274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}