Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3609739
Qiuhui Chen, Xuancheng Yao, Huping Ye, Yi Hong
Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance.
{"title":"Enhancing 3D Medical Image Understanding With Pretraining Aided by 2D Multimodal Large Language Models.","authors":"Qiuhui Chen, Xuancheng Yao, Huping Ye, Yi Hong","doi":"10.1109/JBHI.2025.3609739","DOIUrl":"10.1109/JBHI.2025.3609739","url":null,"abstract":"<p><p>Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1506-1519"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145069448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3614285
Muhammad Mursil, Hatem A Rashwan, Luis Santos-Calderon, Pere Cavalle-Busquets, Michelle M Murphy, Domenec Puig
Birth weight (BW) is a key indicator of neonatal health, and low birth weight (LBW) is linked to increased mortality and morbidity. Early prediction of BW facilitates timely prevention of impaired foetal growth. However, available techniques such as ultrasonography have limitations, including less accuracy when applied before 20 weeks of gestation and operator-dependent variability. Existing BW prediction models often neglect nutritional and genetic influences, and focus mainly on physiological and lifestyle factors. This study presents an attention-based transformer model with a multi-encoder architecture for early ($< 12$ weeks) BW prediction. Our model effectively integrates diverse maternal data, including physiological, lifestyle, nutritional, and genetic data, addressing limitations seen in previous attention-based models such as TabNet. The model achieves a Mean Absolute Error (MAE) of 122 grams and an $R^{2}$ value of 0.94, showing its high predictive accuracy and interoperability with our in-house private dataset. Independent validation confirms generalizability (MAE: 105 grams, $R^{2}$: 0.95) with the IEEE children dataset. To enhance clinical utility, predicted BW is classified into low and normal categories, achieving a sensitivity of 97.55% and a specificity of 94.48%, facilitating early risk stratification. Model interpretability is reinforced through feature importance and SHAP analysis, highlighting significant influences of maternal age, tobacco exposure, and vitamin B12 status, with genetic factors playing a secondary role. Our results emphasize the potential of advanced deep learning models to improve early BW prediction, offering a robust, interpretable, and personalized tool to identify pregnancies at risk and optimize neonatal outcomes.
{"title":"M-TabNet: A Transformer-Based Multi-Encoder for Early Neonatal Birth Weight Prediction Using Multimodal Data.","authors":"Muhammad Mursil, Hatem A Rashwan, Luis Santos-Calderon, Pere Cavalle-Busquets, Michelle M Murphy, Domenec Puig","doi":"10.1109/JBHI.2025.3614285","DOIUrl":"10.1109/JBHI.2025.3614285","url":null,"abstract":"<p><p>Birth weight (BW) is a key indicator of neonatal health, and low birth weight (LBW) is linked to increased mortality and morbidity. Early prediction of BW facilitates timely prevention of impaired foetal growth. However, available techniques such as ultrasonography have limitations, including less accuracy when applied before 20 weeks of gestation and operator-dependent variability. Existing BW prediction models often neglect nutritional and genetic influences, and focus mainly on physiological and lifestyle factors. This study presents an attention-based transformer model with a multi-encoder architecture for early ($< 12$ weeks) BW prediction. Our model effectively integrates diverse maternal data, including physiological, lifestyle, nutritional, and genetic data, addressing limitations seen in previous attention-based models such as TabNet. The model achieves a Mean Absolute Error (MAE) of 122 grams and an $R^{2}$ value of 0.94, showing its high predictive accuracy and interoperability with our in-house private dataset. Independent validation confirms generalizability (MAE: 105 grams, $R^{2}$: 0.95) with the IEEE children dataset. To enhance clinical utility, predicted BW is classified into low and normal categories, achieving a sensitivity of 97.55% and a specificity of 94.48%, facilitating early risk stratification. Model interpretability is reinforced through feature importance and SHAP analysis, highlighting significant influences of maternal age, tobacco exposure, and vitamin B12 status, with genetic factors playing a secondary role. Our results emphasize the potential of advanced deep learning models to improve early BW prediction, offering a robust, interpretable, and personalized tool to identify pregnancies at risk and optimize neonatal outcomes.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1642-1651"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145191639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2024.3424334
Ajmal Mohammed, P Samundiswary
Medical records contain highly sensitive patient information. These medical records are significant for better research, diagnosis, and treatment. However, ensuring secure medical records storage is paramount to protect patient confidentiality, integrity, and privacy. Conventional methods involve encrypting and storing medical records in third-party clouds. Such storage enables convenient access and remote consultation. This cloud storage poses single-point attack risks and may lead to erroneous diagnoses and treatment. To address this, a novel (n,n)VSS scheme is proposed with data embedding, permutation ordered binary number system, tamper detection, and self-recovery mechanism. This approach enables the reconstruction of medical records even in the case of tampering. The tamper detection algorithm ensures data integrity. Simulation results demonstrate the superiority of proposed method in terms of security and reconstruction quality. Here, security analysis is done by considering attacks such as brute force, differential, and tampering attacks. Similarly, the reconstruction quality is evaluated using various human visual system parameters. The results show that the proposed technique provides a lower bit error rate ($approx$0), high average peak signal-to-noise ratio ($approx$35 dB), high structured similarity ($approx$1), high text embedding rate ($approx$0.7 BPP), and lossless reconstruction in the case of attacks.
{"title":"Tamper Detection and Self-Recovery in a Visual Secret Sharing Based Security Mechanism for Medical Records.","authors":"Ajmal Mohammed, P Samundiswary","doi":"10.1109/JBHI.2024.3424334","DOIUrl":"10.1109/JBHI.2024.3424334","url":null,"abstract":"<p><p>Medical records contain highly sensitive patient information. These medical records are significant for better research, diagnosis, and treatment. However, ensuring secure medical records storage is paramount to protect patient confidentiality, integrity, and privacy. Conventional methods involve encrypting and storing medical records in third-party clouds. Such storage enables convenient access and remote consultation. This cloud storage poses single-point attack risks and may lead to erroneous diagnoses and treatment. To address this, a novel (n,n)VSS scheme is proposed with data embedding, permutation ordered binary number system, tamper detection, and self-recovery mechanism. This approach enables the reconstruction of medical records even in the case of tampering. The tamper detection algorithm ensures data integrity. Simulation results demonstrate the superiority of proposed method in terms of security and reconstruction quality. Here, security analysis is done by considering attacks such as brute force, differential, and tampering attacks. Similarly, the reconstruction quality is evaluated using various human visual system parameters. The results show that the proposed technique provides a lower bit error rate ($approx$0), high average peak signal-to-noise ratio ($approx$35 dB), high structured similarity ($approx$1), high text embedding rate ($approx$0.7 BPP), and lossless reconstruction in the case of attacks.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"890-899"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141537866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advances in deep learning have transformed medical imaging, yet progress is hindered by data privacy regulations and fragmented datasets across institutions. To address these challenges, we propose FedVGM, a privacy-preserving federated learning framework for multi-modal medical image analysis. FedVGM integrates four imaging modalities, including brain MRI, breast ultrasound, chest X-ray, and lung CT, across 14 diagnostic classes without centralizing patient data. Using transfer learning and an ensemble of VGG16 and MobileNetV2, FedVGM achieves 97.7% $pm$ 0.01 accuracy on the combined dataset and 91.9-99.1% across individual modalities. We evaluated three aggregation strategies and demonstrated median aggregation to be the most effective. To ensure clinical interpretability, we apply explainable AI techniques and validate results through performance metrics, statistical analysis, and k-fold cross-validation. FedVGM offers a robust, scalable solution for collaborative medical diagnostics, supporting clinical deployment while preserving data privacy.
{"title":"FedVGM: Enhancing Federated Learning Performance on Multi-Dataset Medical Images With XAI.","authors":"Mst Sazia Tahosin, Md Alif Sheakh, Mohammad Jahangir Alam, Md Mehedi Hassan, Anupam Kumar Bairagi, Shahab Abdulla, Samah Alshathri, Walid El-Shafai","doi":"10.1109/JBHI.2025.3600361","DOIUrl":"10.1109/JBHI.2025.3600361","url":null,"abstract":"<p><p>Advances in deep learning have transformed medical imaging, yet progress is hindered by data privacy regulations and fragmented datasets across institutions. To address these challenges, we propose FedVGM, a privacy-preserving federated learning framework for multi-modal medical image analysis. FedVGM integrates four imaging modalities, including brain MRI, breast ultrasound, chest X-ray, and lung CT, across 14 diagnostic classes without centralizing patient data. Using transfer learning and an ensemble of VGG16 and MobileNetV2, FedVGM achieves 97.7% $pm$ 0.01 accuracy on the combined dataset and 91.9-99.1% across individual modalities. We evaluated three aggregation strategies and demonstrated median aggregation to be the most effective. To ensure clinical interpretability, we apply explainable AI techniques and validate results through performance metrics, statistical analysis, and k-fold cross-validation. FedVGM offers a robust, scalable solution for collaborative medical diagnostics, supporting clinical deployment while preserving data privacy.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1272-1285"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hypertension is a critical cardiovascular risk factor, underscoring the necessity of accessible blood pressure (BP) monitoring for its prevention, detection, and management. While cuffless BP estimation using wearable cardiovascular signals via deep learning models (DLMs) offers a promising solution, their implementation often entails high computational costs. This study addresses these challenges by proposing an end-to-end broad learning model (BLM) for efficient cuffless BP estimation. Unlike DLMs that prioritize network depth, the BLM increases network width, thereby reducing computational complexity and enhancing training efficiency for continuous BP estimation. An incremental learning mode is also explored to provide high memory efficiency and flexibility. Validation on the University of California Irvine (UCI) database (403.67 hours) demonstrated that the standard BLM (SBLM) achieved a mean absolute error (MAE) of 11.72 mmHg for arterial BP (ABP) waveform estimation, performance comparable to DLMs such as long short-term memory (LSTM) and the one-dimensional convolutional neural network (1D-CNN), while improving training efficiency by 25.20 times. The incremental BLM (IBLM) offered horizontal scalability by expanding through node addition in a single layer, maintaining predictive performance while reducing storage demands through support for incremental learning with streaming or partial datasets. For systolic and diastolic BP prediction, the SBLM achieved MAEs (mean error $pm$ standard deviation) of 3.04 mmHg (2.85 $pm$ 4.15 mmHg) and 2.57 mmHg (-2.47 $pm$ 3.03 mmHg), respectively. This study highlights the potential of BLM for personalized, real-time, continuous cuffless BP monitoring, presenting a practical solution for healthcare applications.
{"title":"Continuous Cuffless Blood Pressure Estimation via Effective and Efficient Broad Learning Model.","authors":"Chunlin Zhang, Pingyu Hu, Zhan Shen, Xiaorong Ding","doi":"10.1109/JBHI.2025.3604464","DOIUrl":"10.1109/JBHI.2025.3604464","url":null,"abstract":"<p><p>Hypertension is a critical cardiovascular risk factor, underscoring the necessity of accessible blood pressure (BP) monitoring for its prevention, detection, and management. While cuffless BP estimation using wearable cardiovascular signals via deep learning models (DLMs) offers a promising solution, their implementation often entails high computational costs. This study addresses these challenges by proposing an end-to-end broad learning model (BLM) for efficient cuffless BP estimation. Unlike DLMs that prioritize network depth, the BLM increases network width, thereby reducing computational complexity and enhancing training efficiency for continuous BP estimation. An incremental learning mode is also explored to provide high memory efficiency and flexibility. Validation on the University of California Irvine (UCI) database (403.67 hours) demonstrated that the standard BLM (SBLM) achieved a mean absolute error (MAE) of 11.72 mmHg for arterial BP (ABP) waveform estimation, performance comparable to DLMs such as long short-term memory (LSTM) and the one-dimensional convolutional neural network (1D-CNN), while improving training efficiency by 25.20 times. The incremental BLM (IBLM) offered horizontal scalability by expanding through node addition in a single layer, maintaining predictive performance while reducing storage demands through support for incremental learning with streaming or partial datasets. For systolic and diastolic BP prediction, the SBLM achieved MAEs (mean error $pm$ standard deviation) of 3.04 mmHg (2.85 $pm$ 4.15 mmHg) and 2.57 mmHg (-2.47 $pm$ 3.03 mmHg), respectively. This study highlights the potential of BLM for personalized, real-time, continuous cuffless BP monitoring, presenting a practical solution for healthcare applications.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1101-1114"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early detection and proper treatment of epilepsy is essential and meaningful to those who suffer from this disease. The adoption of deep learning (DL) techniques for automated epileptic seizure detection using electroencephalography (EEG) signals has shown great potential in making the most appropriate and fast medical decisions. However, DL algorithms have high computational complexity and suffer low accuracy with imbalanced medical data in multi seizure-classification task. Motivated from the aforementioned challenges, we present a simple and effective hybrid DL approach for epileptic seizure detection in EEG signals. Specifically, first we use a K-means Synthetic minority oversampling technique (SMOTE) to balance the sampling data. Second, we integrate a 1D convolutional neural network (CNN) with a Bidirectional Long Short-Term Memory (BiLSTM) network based on Truncated Backpropagation Through Time (TBPTT) to efficiently extract spatial and temporal sequence information while reducing computational complexity. Finally, the proposed DL architecture uses softmax and sigmoid classifiers at the classification layer to perform multi and binary seizure-classification tasks. In addition, the 10-fold cross-validation technique is performed to show the significance of the proposed DL approach. Experimental results using the publicly available UCI epileptic seizure recognition data set shows better performance in terms of precision, sensitivity, specificity, and F1-score over some baseline DL algorithms and recent state-of-the-art techniques.
{"title":"A Hybrid Deep Learning Approach for Epileptic Seizure Detection in EEG signals.","authors":"Ijaz Ahmad, Xin Wang, Danish Javeed, Prabhat Kumar, Oluwarotimi Williams Samuel, Shixiong Chen","doi":"10.1109/JBHI.2023.3265983","DOIUrl":"10.1109/JBHI.2023.3265983","url":null,"abstract":"<p><p>Early detection and proper treatment of epilepsy is essential and meaningful to those who suffer from this disease. The adoption of deep learning (DL) techniques for automated epileptic seizure detection using electroencephalography (EEG) signals has shown great potential in making the most appropriate and fast medical decisions. However, DL algorithms have high computational complexity and suffer low accuracy with imbalanced medical data in multi seizure-classification task. Motivated from the aforementioned challenges, we present a simple and effective hybrid DL approach for epileptic seizure detection in EEG signals. Specifically, first we use a K-means Synthetic minority oversampling technique (SMOTE) to balance the sampling data. Second, we integrate a 1D convolutional neural network (CNN) with a Bidirectional Long Short-Term Memory (BiLSTM) network based on Truncated Backpropagation Through Time (TBPTT) to efficiently extract spatial and temporal sequence information while reducing computational complexity. Finally, the proposed DL architecture uses softmax and sigmoid classifiers at the classification layer to perform multi and binary seizure-classification tasks. In addition, the 10-fold cross-validation technique is performed to show the significance of the proposed DL approach. Experimental results using the publicly available UCI epileptic seizure recognition data set shows better performance in terms of precision, sensitivity, specificity, and F1-score over some baseline DL algorithms and recent state-of-the-art techniques.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1019-1029"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9274448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate motor imagery (MI) classification in EEG-based brain-computer interfaces (BCIs) is essential for applications in engineering, medicine, and artificial intelligence. Due to the limitations of single-model approaches, hybrid model architectures have emerged as a promising direction. In particular, convolutional neural networks (CNNs) and vision transformers (ViTs) demonstrate strong complementary capabilities, leading to enhanced performance. This study proposes a series of novel models, termed as CNNViT-MI, to explore the synergy of CNNs and ViTs for MI classification. Specifically, five fusion strategies were defined: parallel integration, sequential integration, hierarchical integration, early fusion, and late fusion. Based on these strategies, eight candidate models were developed. Experiments were conducted on four datasets: BCI competition IV dataset 2a, BCI competition IV dataset 2b, high gamma dataset, and a self-collected MI-GS dataset. The results demonstrate that CNNViT-MILF-a achieves the best performance among all candidates by leveraging ViT as the backbone for global feature extraction and incorporating CNN-based local representations through a late fusion strategy. Compared to the best-performing state-of-the-art (SOTA) methods, mean accuracy was improved by 2.27%, 2.31%, 0.74%, and 2.50% on the respective datasets, confirming the model's effectiveness and broad applicability, other metrics showed similar improvements. In addition, significance analysis, ablation studies, and visualization analysis were conducted, and corresponding clinical integration and rehabilitation protocols were developed to support practical use in healthcare.
在基于脑电图的脑机接口(bci)中,准确的运动图像(MI)分类对于工程、医学和人工智能的应用至关重要。由于单一模型方法的局限性,混合模型体系结构已经成为一个有前途的方向。特别是卷积神经网络(cnn)和视觉变压器(ViTs)表现出很强的互补能力,从而提高了性能。本研究提出了一系列被称为CNNViT-MI的新模型,以探索cnn和vit在MI分类中的协同作用。具体而言,定义了五种融合策略:并行融合、顺序融合、分层融合、早期融合和晚期融合。基于这些策略,开发了8个候选模型。实验在4个数据集上进行:BCI competition IV dataset 2a、BCI competition IV dataset 2b、高伽马数据集和自收集的MI-GS数据集。结果表明,cnnit - milf -a利用ViT作为全局特征提取的骨干,并通过后期融合策略融合基于cnn的局部表征,在所有候选图像中取得了最佳性能。与表现最好的最先进的(SOTA)方法相比,在各自的数据集上,平均准确率提高了2.27%,2.31%,0.74%和2.50%,证实了模型的有效性和广泛适用性,其他指标也显示出类似的改进。此外,还进行了显著性分析、消融研究和可视化分析,并制定了相应的临床整合和康复方案,以支持在医疗保健中的实际应用。
{"title":"CNNViT-MILF-a: A Novel Architecture Leveraging the Synergy of CNN and ViT for Motor Imagery Classification.","authors":"Zhenxi Zhao, Yingyu Cao, Hongbin Yu, Huixian Yu, Junfen Huang","doi":"10.1109/JBHI.2025.3587026","DOIUrl":"10.1109/JBHI.2025.3587026","url":null,"abstract":"<p><p>Accurate motor imagery (MI) classification in EEG-based brain-computer interfaces (BCIs) is essential for applications in engineering, medicine, and artificial intelligence. Due to the limitations of single-model approaches, hybrid model architectures have emerged as a promising direction. In particular, convolutional neural networks (CNNs) and vision transformers (ViTs) demonstrate strong complementary capabilities, leading to enhanced performance. This study proposes a series of novel models, termed as CNNViT-MI, to explore the synergy of CNNs and ViTs for MI classification. Specifically, five fusion strategies were defined: parallel integration, sequential integration, hierarchical integration, early fusion, and late fusion. Based on these strategies, eight candidate models were developed. Experiments were conducted on four datasets: BCI competition IV dataset 2a, BCI competition IV dataset 2b, high gamma dataset, and a self-collected MI-GS dataset. The results demonstrate that CNNViT-MILF-a achieves the best performance among all candidates by leveraging ViT as the backbone for global feature extraction and incorporating CNN-based local representations through a late fusion strategy. Compared to the best-performing state-of-the-art (SOTA) methods, mean accuracy was improved by 2.27%, 2.31%, 0.74%, and 2.50% on the respective datasets, confirming the model's effectiveness and broad applicability, other metrics showed similar improvements. In addition, significance analysis, ablation studies, and visualization analysis were conducted, and corresponding clinical integration and rehabilitation protocols were developed to support practical use in healthcare.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1153-1165"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3599643
Alberto Altozano, Maria Eleonora Minissi, Lucia Gomez-Zaragoza, Luna Maddalon, Mariano Alcaniz, Javier Marin-Morales
Open-ended questionnaires allow respondents to express freely, capturing richer information than close-ended formats, but they are harder to analyze. Recent natural language processing advancements enable automatic assessment of open-ended responses, yet its use in psychological classification is underexplored. This study proposes a methodology using pre-trained large language models (LLMs) for automatic classification of open-ended questionnaires, applied to autism spectrum disorder (ASD) classification via parental reports. We compare multiple training strategies using transcribed responses from 51 parents (26 with typically developing children, 25 with ASD), exploring variations in model fine-tuning, input representation, and specificity. Subject-level predictions are derived by aggregating 12 individual question responses. Our best approach achieved 84% subject-wise accuracy and 1.0 ROC-AUC using an OpenAI embedding model, per-question training, including questions in the input, and combining the predictions with a voting system. In addition, a zero-shot evaluation using GPT-4o was conducted, yielding comparable results, underscoring the potential of both compact, local models and large out-of-the-box LLMs. To enhance transparency, we explored interpretability methods. Proprietary LLMs like GPT-4o offered no direct explanation, and OpenAI embedding models showed limited interpretability. However, locally deployable LLMs provided the highest interpretability. This highlights a trade-off between proprietary models' performance and local models' explainability. Our findings validate LLMs for automatically classifying open-ended questionnaires, offering a scalable, cost-effective complement for ASD assessment. These results suggest broader applicability for psychological analysis of other conditions, advancing LLM use in mental health research.
{"title":"Enhancing Psychological Assessments With Open-Ended Questionnaires and Large Language Models: An ASD Case Study.","authors":"Alberto Altozano, Maria Eleonora Minissi, Lucia Gomez-Zaragoza, Luna Maddalon, Mariano Alcaniz, Javier Marin-Morales","doi":"10.1109/JBHI.2025.3599643","DOIUrl":"10.1109/JBHI.2025.3599643","url":null,"abstract":"<p><p>Open-ended questionnaires allow respondents to express freely, capturing richer information than close-ended formats, but they are harder to analyze. Recent natural language processing advancements enable automatic assessment of open-ended responses, yet its use in psychological classification is underexplored. This study proposes a methodology using pre-trained large language models (LLMs) for automatic classification of open-ended questionnaires, applied to autism spectrum disorder (ASD) classification via parental reports. We compare multiple training strategies using transcribed responses from 51 parents (26 with typically developing children, 25 with ASD), exploring variations in model fine-tuning, input representation, and specificity. Subject-level predictions are derived by aggregating 12 individual question responses. Our best approach achieved 84% subject-wise accuracy and 1.0 ROC-AUC using an OpenAI embedding model, per-question training, including questions in the input, and combining the predictions with a voting system. In addition, a zero-shot evaluation using GPT-4o was conducted, yielding comparable results, underscoring the potential of both compact, local models and large out-of-the-box LLMs. To enhance transparency, we explored interpretability methods. Proprietary LLMs like GPT-4o offered no direct explanation, and OpenAI embedding models showed limited interpretability. However, locally deployable LLMs provided the highest interpretability. This highlights a trade-off between proprietary models' performance and local models' explainability. Our findings validate LLMs for automatically classifying open-ended questionnaires, offering a scalable, cost-effective complement for ASD assessment. These results suggest broader applicability for psychological analysis of other conditions, advancing LLM use in mental health research.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1707-1720"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144859134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3639185
Shaocong Mo, Ming Cai, Lanfen Lin, Ruofeng Tong, Fang Wang, Qingqing Chen, Wenbin Ji, Yinhao Li, Hongjie Hu, Yen-Wei Chen
Multimodal magnetic resonance imaging (MRI) is instrumental in differentiating liver lesions. The major challenge involves modeling reliable connections and simultaneously learning complementary information across various MRI sequences. While previous studies have primarily focused on multimodal integration in a pair-wise manner using few modalities, our research seeks to advance a more comprehensive understanding of interaction modeling by establishing complex high-order correlations among the diverse modalities in multimodal MRI. In this paper, we introduce a multimodal graph learning with multi-hypergraph reasoning network to capture the full spectrum of both pair-wise and group-wise relationships among different modalities. Specifically, a weight-shared encoder extracts features from regions of interest (ROI) images across all modalities. Subsequently, a collection of uniform hypergraphs are constructed with varying vertex configurations, allowing for the modeling of not only pair-wise correlations but also the high-order collaborations for relational reasoning. Following information propagation through the hypergraph message passing, adaptive intra-modality fusion module is proposed to effectively fuse feature representations from different hypergraphs of the same modality. Finally, all refined features are concatenated to prepare for the classification task. Our experimental evaluations, including focal liver lesions classification using the LLD-MMRI2023 dataset and early recurrence prediction of hepatocellular carcinoma using our internal datasets, demonstrate that our method significantly surpasses the performance of existing approaches, indicating the effectiveness of our model in handling both pair-wise and group-wise interactions across multiple modalities.
{"title":"Multimodal Graph Learning With Multi-Hypergraph Reasoning Networks for Focal Liver Lesion Classification in Multimodal Magnetic Resonance Imaging.","authors":"Shaocong Mo, Ming Cai, Lanfen Lin, Ruofeng Tong, Fang Wang, Qingqing Chen, Wenbin Ji, Yinhao Li, Hongjie Hu, Yen-Wei Chen","doi":"10.1109/JBHI.2025.3639185","DOIUrl":"10.1109/JBHI.2025.3639185","url":null,"abstract":"<p><p>Multimodal magnetic resonance imaging (MRI) is instrumental in differentiating liver lesions. The major challenge involves modeling reliable connections and simultaneously learning complementary information across various MRI sequences. While previous studies have primarily focused on multimodal integration in a pair-wise manner using few modalities, our research seeks to advance a more comprehensive understanding of interaction modeling by establishing complex high-order correlations among the diverse modalities in multimodal MRI. In this paper, we introduce a multimodal graph learning with multi-hypergraph reasoning network to capture the full spectrum of both pair-wise and group-wise relationships among different modalities. Specifically, a weight-shared encoder extracts features from regions of interest (ROI) images across all modalities. Subsequently, a collection of uniform hypergraphs are constructed with varying vertex configurations, allowing for the modeling of not only pair-wise correlations but also the high-order collaborations for relational reasoning. Following information propagation through the hypergraph message passing, adaptive intra-modality fusion module is proposed to effectively fuse feature representations from different hypergraphs of the same modality. Finally, all refined features are concatenated to prepare for the classification task. Our experimental evaluations, including focal liver lesions classification using the LLD-MMRI2023 dataset and early recurrence prediction of hepatocellular carcinoma using our internal datasets, demonstrate that our method significantly surpasses the performance of existing approaches, indicating the effectiveness of our model in handling both pair-wise and group-wise interactions across multiple modalities.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1404-1417"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145654299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3597997
Jun Yang, Chen Zhu, Renbiao Wu
Non-contact heart rate detection technology leverages changes in skin color to estimate heart rate, enhancing the convenience of health monitoring, particularly in situations requiring real-time, contact-free observation. However, current video-based methods face various limitations, including restricted feature extraction capabilities, redundant spatial information, and ineffective motion artifact processing. To address these problems, a novel end-to-end heart rate estimation network, Spatial-Temporal-Channel Network (STCNet), is proposed. Firstly, in order to solve the problem of redundant spatial information in current video-based heart rate estimation methods, a spatial attention learning (SAL) unit is designed to highlight the effective information of the facial region. Next, an improved temporal shift module (TSMP) with long-range temporal information perception is proposed. On this basis, A temporal-channel learning (TCL) unit is designed to achieve the interaction of information across different frames' channels, aiming to address the insufficient capability of existing models in extracting periodic features of heartbeat. Finally, combining the SAL and TCL units, a feature extraction block (FEB) is designed. A feature extraction network is constructed by stacking multiple layers of FEBs to achieve accurate heart rate estimation. Numerous experiments are conducted on the UBFC-rPPG dataset and the PURE dataset to verify the effectiveness and generalization ability of our model. Notably, compared to the state-of-the-art CIN-rPPG, our model achieves a 0.27 bpm reduction in mean absolute error (MAE) and a 0.19 bpm reduction in root mean square error (RMSE), in intra-dataset testing on the PURE dataset. Experimental results demonstrate that our proposed model outperforms other mainstream models.
{"title":"Robust Remote Heart Rate Estimation Network Based on Spatial-Temporal-Channel Learning From Facial Videos.","authors":"Jun Yang, Chen Zhu, Renbiao Wu","doi":"10.1109/JBHI.2025.3597997","DOIUrl":"10.1109/JBHI.2025.3597997","url":null,"abstract":"<p><p>Non-contact heart rate detection technology leverages changes in skin color to estimate heart rate, enhancing the convenience of health monitoring, particularly in situations requiring real-time, contact-free observation. However, current video-based methods face various limitations, including restricted feature extraction capabilities, redundant spatial information, and ineffective motion artifact processing. To address these problems, a novel end-to-end heart rate estimation network, Spatial-Temporal-Channel Network (STCNet), is proposed. Firstly, in order to solve the problem of redundant spatial information in current video-based heart rate estimation methods, a spatial attention learning (SAL) unit is designed to highlight the effective information of the facial region. Next, an improved temporal shift module (TSMP) with long-range temporal information perception is proposed. On this basis, A temporal-channel learning (TCL) unit is designed to achieve the interaction of information across different frames' channels, aiming to address the insufficient capability of existing models in extracting periodic features of heartbeat. Finally, combining the SAL and TCL units, a feature extraction block (FEB) is designed. A feature extraction network is constructed by stacking multiple layers of FEBs to achieve accurate heart rate estimation. Numerous experiments are conducted on the UBFC-rPPG dataset and the PURE dataset to verify the effectiveness and generalization ability of our model. Notably, compared to the state-of-the-art CIN-rPPG, our model achieves a 0.27 bpm reduction in mean absolute error (MAE) and a 0.19 bpm reduction in root mean square error (RMSE), in intra-dataset testing on the PURE dataset. Experimental results demonstrate that our proposed model outperforms other mainstream models.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1258-1271"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144834977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}