The thickness of the diaphragm serves as a crucial biometric indicator, particularly in assessing rehabilitation and respiratory dysfunction. However, measuring diaphragm thickness from ultrasound images mainly depends on manual delineation of the fascia, which is subjective, time-consuming, and sensitive to the inherent speckle noise. In this study, we introduce an edge-aware diffusion segmentation model (ESADiff), which incorporates prior structural knowledge of the fascia to improve the accuracy and reliability of diaphragm thickness measurements in ultrasound imaging. We first apply a diffusion model, guided by annotations, to learn the image features while preserving edge details through an iterative denoising process. Specifically, we design an anisotropic edge-sensitive annotation refinement module that corrects inaccurate labels by integrating Hessian geometric priors with a backtracking shortest-path connection algorithm, further enhancing model accuracy. Moreover, a curvature-aware deformable convolution and edge-prior ranking loss function are proposed to leverage the shape prior knowledge of the fascia, allowing the model to selectively focus on relevant linear structures while mitigating the influence of noise on feature extraction. We evaluated the proposed model on an in-house diaphragm ultrasound dataset, a public calf muscle dataset, and an internal tongue muscle dataset to demonstrate robust generalization. Extensive experimental results demonstrate that our method achieves finer fascia segmentation and significantly improves the accuracy of thickness measurements compared to other state-of-the-art techniques, highlighting its potential for clinical applications.
{"title":"Edge-Aware Diffusion Segmentation Model With Hessian Priors for Automated Diaphragm Thickness Measurement in Ultrasound Imaging.","authors":"Chen-Long Miao, Yikang He, Baike Shi, Zhongkai Bian, Wenxue Yu, Yang Chen, Guang-Quan Zhou","doi":"10.1109/JBHI.2025.3601567","DOIUrl":"10.1109/JBHI.2025.3601567","url":null,"abstract":"<p><p>The thickness of the diaphragm serves as a crucial biometric indicator, particularly in assessing rehabilitation and respiratory dysfunction. However, measuring diaphragm thickness from ultrasound images mainly depends on manual delineation of the fascia, which is subjective, time-consuming, and sensitive to the inherent speckle noise. In this study, we introduce an edge-aware diffusion segmentation model (ESADiff), which incorporates prior structural knowledge of the fascia to improve the accuracy and reliability of diaphragm thickness measurements in ultrasound imaging. We first apply a diffusion model, guided by annotations, to learn the image features while preserving edge details through an iterative denoising process. Specifically, we design an anisotropic edge-sensitive annotation refinement module that corrects inaccurate labels by integrating Hessian geometric priors with a backtracking shortest-path connection algorithm, further enhancing model accuracy. Moreover, a curvature-aware deformable convolution and edge-prior ranking loss function are proposed to leverage the shape prior knowledge of the fascia, allowing the model to selectively focus on relevant linear structures while mitigating the influence of noise on feature extraction. We evaluated the proposed model on an in-house diaphragm ultrasound dataset, a public calf muscle dataset, and an internal tongue muscle dataset to demonstrate robust generalization. Extensive experimental results demonstrate that our method achieves finer fascia segmentation and significantly improves the accuracy of thickness measurements compared to other state-of-the-art techniques, highlighting its potential for clinical applications.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1544-1554"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3634072
Usman Anwar, Adeel Hussain, Mandar Gogate, Kia Dashtipour, Tughrul Arslan, Amir Hussain, Peter Lomax
The detection of listening effort or cognitive load (CL) has been a major research challenge in recent years. Most conventional techniques utilise physiological or audio-visual sensors and are privacy-invasive and computationally complex. The challenges of synchronization, data alignment and accessibility limitations potentially increase the noise and error probability, compromising the accuracy of CL estimates. This innovative work presents a multi-modal, non-invasive and privacy-preserving approach that combines Radio Frequency (RF) and pupillometry sensing to address these challenges. Custom RF sensors are first designed and developed to capture blood flow changes in specific brain regions with high spatial resolution. Next, multi-modal fusion with pupillometry sensing is proposed and shown to offer a robust assessment of cognitive and listening effort through pupil size and pupil dilation. Our novel approach evaluates RF sensing to estimate CL from cerebral blood flow variations utilizing pupillometry as a baseline. A first-of-its-kind, multi-modal dataset is collected as a new benchmark resource in a controlled environment with participants to comprehend target speech with varying background noise levels. The framework is statistically evaluated using intraclass correlation for pupillometry data (average ICC> 0.95). The correlation between pupillometry and RF data is established through Pearson's correlation (average PCC> 0.79). Further, CL is classified into high and low categories based on RF data using K-means clustering. Future work involves integrating RF sensors with glasses to estimate listening effort for hearing-aid users and utilising RF measurements to optimize speech enhancement based on individual's listening effort and complexity of acoustic environment.
{"title":"Multimodal Cognitive Load Estimation With Radio Frequency Sensing and Pupillometry in Complex Auditory Environments.","authors":"Usman Anwar, Adeel Hussain, Mandar Gogate, Kia Dashtipour, Tughrul Arslan, Amir Hussain, Peter Lomax","doi":"10.1109/JBHI.2025.3634072","DOIUrl":"10.1109/JBHI.2025.3634072","url":null,"abstract":"<p><p>The detection of listening effort or cognitive load (CL) has been a major research challenge in recent years. Most conventional techniques utilise physiological or audio-visual sensors and are privacy-invasive and computationally complex. The challenges of synchronization, data alignment and accessibility limitations potentially increase the noise and error probability, compromising the accuracy of CL estimates. This innovative work presents a multi-modal, non-invasive and privacy-preserving approach that combines Radio Frequency (RF) and pupillometry sensing to address these challenges. Custom RF sensors are first designed and developed to capture blood flow changes in specific brain regions with high spatial resolution. Next, multi-modal fusion with pupillometry sensing is proposed and shown to offer a robust assessment of cognitive and listening effort through pupil size and pupil dilation. Our novel approach evaluates RF sensing to estimate CL from cerebral blood flow variations utilizing pupillometry as a baseline. A first-of-its-kind, multi-modal dataset is collected as a new benchmark resource in a controlled environment with participants to comprehend target speech with varying background noise levels. The framework is statistically evaluated using intraclass correlation for pupillometry data (average ICC> 0.95). The correlation between pupillometry and RF data is established through Pearson's correlation (average PCC> 0.79). Further, CL is classified into high and low categories based on RF data using K-means clustering. Future work involves integrating RF sensors with glasses to estimate listening effort for hearing-aid users and utilising RF measurements to optimize speech enhancement based on individual's listening effort and complexity of acoustic environment.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1605-1617"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145556854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Internet of Medical Things (IoMT) has transformed traditional healthcare systems by enabling real-time monitoring, remote diagnostics, and data-driven treatment. However, security and privacy remain significant concerns for IoMT adoption due to the sensitive nature of medical data. Therefore, we propose an integrated framework leveraging blockchain and explainable artificial intelligence (XAI) to enable secure, intelligent, and transparent management of IoMT data. First, the traceability and tamper-proof of blockchain are used to realize the secure transaction of IoMT data, transforming the secure transaction of IoMT data into a two-stage Stackelberg game. The dual-chain architecture is used to ensure the security and privacy protection of the transaction. The main-chain manages regular IoMT data transactions, while the side-chain deals with data trading activities aimed at resale. Simultaneously, the perceptual hash technology is used to realize data rights confirmation, which maximally protects the rights and interests of each participant in the transaction. Subsequently, medical time-series data is modeled using bidirectional simple recurrent units to detect anomalies and cyberthreats accurately while overcoming vanishing gradients. Lastly, an adversarial sample generation method based on local interpretable model-agnostic explanations is provided to evaluate, secure, and improve the anomaly detection model, as well as to make it more explainable and resilient to possible adversarial attacks. Simulation results are provided to illustrate the high performance of the integrated secure data management framework leveraging blockchain and XAI, compared with the benchmarks.
{"title":"XAI Driven Intelligent IoMT Secure Data Management Framework.","authors":"Wei Liu, Feng Zhao, Lewis Nkenyereye, Shalli Rani, Keqin Li, Jianhui Lv","doi":"10.1109/JBHI.2024.3408215","DOIUrl":"10.1109/JBHI.2024.3408215","url":null,"abstract":"<p><p>The Internet of Medical Things (IoMT) has transformed traditional healthcare systems by enabling real-time monitoring, remote diagnostics, and data-driven treatment. However, security and privacy remain significant concerns for IoMT adoption due to the sensitive nature of medical data. Therefore, we propose an integrated framework leveraging blockchain and explainable artificial intelligence (XAI) to enable secure, intelligent, and transparent management of IoMT data. First, the traceability and tamper-proof of blockchain are used to realize the secure transaction of IoMT data, transforming the secure transaction of IoMT data into a two-stage Stackelberg game. The dual-chain architecture is used to ensure the security and privacy protection of the transaction. The main-chain manages regular IoMT data transactions, while the side-chain deals with data trading activities aimed at resale. Simultaneously, the perceptual hash technology is used to realize data rights confirmation, which maximally protects the rights and interests of each participant in the transaction. Subsequently, medical time-series data is modeled using bidirectional simple recurrent units to detect anomalies and cyberthreats accurately while overcoming vanishing gradients. Lastly, an adversarial sample generation method based on local interpretable model-agnostic explanations is provided to evaluate, secure, and improve the anomaly detection model, as well as to make it more explainable and resilient to possible adversarial attacks. Simulation results are provided to illustrate the high performance of the integrated secure data management framework leveraging blockchain and XAI, compared with the benchmarks.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"935-946"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141237831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning has significantly advanced medical image processing, yet the inherent inclusion of personally identifiable information (PII) within medical images-such as facial features, distinctive anatomical structures, rare lesions, or specific textural patterns-poses a critical risk to patient privacy during data transmission. To mitigate this risk, we introduce the Medical Semantic Diffusion Model (MSDM), a novel framework designed to synthesize medical images guided by semantic information, synthesis images with the same distribution as the original data, which effectively removes the PPI of the original data to ensure robust privacy protection. Unlike conventional techniques that combine semantic and noisy images for denoising, MSDM integrates Adaptive Batch Normalization (AdaBN) to encode semantic information into high-dimensional latent space, embedding it directly within the denoising neural network. This approach enhances image quality and semantic accuracy while ensuring that the synthetic and original images belong to the same distribution. In addition, to further accelerate synthesis and reduce dependency on manually crafted semantic masks, we propose the Spread Algorithm, which automatically generates these masks. Extensive experiments conducted on the BraTS 2021, MSD Lung, DSB18, and FIVES datasets confirm the efficacy of MSDM, yielding state-of-the-art results across several performance metrics. Augmenting datasets with MSDM-generated images in nnUNet segmentation experiments led to Dice scores of 0.6243, 0.9531, 0.9406, and 0.9562 underscoring its potential for enhancing both image quality and privacy-preserving data augmentation.
{"title":"A Semantic Conditional Diffusion Model for Enhanced Personal Privacy Preservation in Medical Images.","authors":"Shudong Wang, Zhiyuan Zhao, Yawu Zhao, Luqi Wang, Yuanyuan Zhang, Jiehuan Wang, Sibo Qiao, Zhihan Lyu","doi":"10.1109/JBHI.2024.3511583","DOIUrl":"10.1109/JBHI.2024.3511583","url":null,"abstract":"<p><p>Deep learning has significantly advanced medical image processing, yet the inherent inclusion of personally identifiable information (PII) within medical images-such as facial features, distinctive anatomical structures, rare lesions, or specific textural patterns-poses a critical risk to patient privacy during data transmission. To mitigate this risk, we introduce the Medical Semantic Diffusion Model (MSDM), a novel framework designed to synthesize medical images guided by semantic information, synthesis images with the same distribution as the original data, which effectively removes the PPI of the original data to ensure robust privacy protection. Unlike conventional techniques that combine semantic and noisy images for denoising, MSDM integrates Adaptive Batch Normalization (AdaBN) to encode semantic information into high-dimensional latent space, embedding it directly within the denoising neural network. This approach enhances image quality and semantic accuracy while ensuring that the synthetic and original images belong to the same distribution. In addition, to further accelerate synthesis and reduce dependency on manually crafted semantic masks, we propose the Spread Algorithm, which automatically generates these masks. Extensive experiments conducted on the BraTS 2021, MSD Lung, DSB18, and FIVES datasets confirm the efficacy of MSDM, yielding state-of-the-art results across several performance metrics. Augmenting datasets with MSDM-generated images in nnUNet segmentation experiments led to Dice scores of 0.6243, 0.9531, 0.9406, and 0.9562 underscoring its potential for enhancing both image quality and privacy-preserving data augmentation.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"853-864"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3565271
Yirui Wu, Xinfu Liu, Lucia Cascone, Michele Nappi, Shaohua Wan
There is a rising concern about healthcare system security, where data loss could bring lots of damages to patients and hospitals. As a promising encryption method for medical images, DNA encoding own characteristics of high speed, parallelism computation, minimal storage, and unbreakable cryptosystems. Inspired by the idea of involving Large Language Models(LLMs) to improve DNA encoding, we propose a medical image encryption method with LLM-enhanced DNA encoding, which consists of LLM enhancing module and content-aware permutation&diffusion module. Regarding medical images generally have plain backgrounds with low-entropy pixels, the first module compresses pixels into highly compact signals with features of probabilistic varying and plausibly deniability, serving as another LLM-based layer of defense against privacy breaches before DNA encoding. The second module not only adds permutation by randomly sampling from a redundant correlation between adjacent pixels to break the internal links between pixels but also performs a DNA-based diffusion process to greatly increase the complexity of cracking. Experiments on ChestXray-14, COVID-CT and fcon-1000 datasets show that the proposed method outperforms all comparative methods in sensitivity, correlation and entropy.
{"title":"Plausible Deniable Medical Image Encryption by Large Language Models and Reversible Content-Aware Strategy.","authors":"Yirui Wu, Xinfu Liu, Lucia Cascone, Michele Nappi, Shaohua Wan","doi":"10.1109/JBHI.2025.3565271","DOIUrl":"10.1109/JBHI.2025.3565271","url":null,"abstract":"<p><p>There is a rising concern about healthcare system security, where data loss could bring lots of damages to patients and hospitals. As a promising encryption method for medical images, DNA encoding own characteristics of high speed, parallelism computation, minimal storage, and unbreakable cryptosystems. Inspired by the idea of involving Large Language Models(LLMs) to improve DNA encoding, we propose a medical image encryption method with LLM-enhanced DNA encoding, which consists of LLM enhancing module and content-aware permutation&diffusion module. Regarding medical images generally have plain backgrounds with low-entropy pixels, the first module compresses pixels into highly compact signals with features of probabilistic varying and plausibly deniability, serving as another LLM-based layer of defense against privacy breaches before DNA encoding. The second module not only adds permutation by randomly sampling from a redundant correlation between adjacent pixels to break the internal links between pixels but also performs a DNA-based diffusion process to greatly increase the complexity of cracking. Experiments on ChestXray-14, COVID-CT and fcon-1000 datasets show that the proposed method outperforms all comparative methods in sensitivity, correlation and entropy.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"947-957"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143964998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3586906
Congming Tan, Jiayang Xu, Liangliang Hu, Yin Tian
Emotional neuromodulation refers to the direct manipulation of the nervous system using techniques such as electrical or magnetic stimulation to manage and adjust an individual's emotional experiences. Transcranial electrical stimulation (tES) targeting the right ventrolateral prefrontal cortex (rVLPFC) has been widely used to modulate emotions. However, the impact of emotions on brain network changes and modulation during tES remains unclear. In this study, we developed a subject-adaptive dynamic graph convolution network with fused features (FusSADGCNN) to decode the impact of tES on neuromodulation for emotion recognition and emotion elicitation. Specifically, we developed a fused feature, CPE, which integrates the average sub-frequency phase-locking value representing global functional connectivity with differential entropy characterizing local activation to explore network differences across emotional states, while incorporating an improved dynamic graph convolution to adaptively integrate multi-receptive neighborhood information for precise decoding of individual tES effects. On the SEED dataset and our laboratory data, the FusSADGCNN model outperforms the state-of-the-art methods. Furthermore, we utilized these tools to assess the emotional modulation states induced by tES. Results indicated that in the experiment involving music-elicited emotional modulation, the tools effectively identified improvements in negative emotions under true stimulation, with predictive accuracy significantly related to the average connectivity strength of the brain network. In the active facial emotion recognition modulation experiment, jointed stimulation of rVLPFC and temporo-parietal junction achieved better modulation effects. These findings highlight that the FusSADGCNN effectively evaluate the neuromodulation states during tES-induced emotional regulation, providing a reliable foundation for integrating emotion recognition and neuromodulation.
{"title":"FusSADGCNN: Decoding the Impact of Transcranial Electrical Stimulation on Neuromodulation in Emotion Recognition and Emotion Elicitation.","authors":"Congming Tan, Jiayang Xu, Liangliang Hu, Yin Tian","doi":"10.1109/JBHI.2025.3586906","DOIUrl":"10.1109/JBHI.2025.3586906","url":null,"abstract":"<p><p>Emotional neuromodulation refers to the direct manipulation of the nervous system using techniques such as electrical or magnetic stimulation to manage and adjust an individual's emotional experiences. Transcranial electrical stimulation (tES) targeting the right ventrolateral prefrontal cortex (rVLPFC) has been widely used to modulate emotions. However, the impact of emotions on brain network changes and modulation during tES remains unclear. In this study, we developed a subject-adaptive dynamic graph convolution network with fused features (FusSADGCNN) to decode the impact of tES on neuromodulation for emotion recognition and emotion elicitation. Specifically, we developed a fused feature, CPE, which integrates the average sub-frequency phase-locking value representing global functional connectivity with differential entropy characterizing local activation to explore network differences across emotional states, while incorporating an improved dynamic graph convolution to adaptively integrate multi-receptive neighborhood information for precise decoding of individual tES effects. On the SEED dataset and our laboratory data, the FusSADGCNN model outperforms the state-of-the-art methods. Furthermore, we utilized these tools to assess the emotional modulation states induced by tES. Results indicated that in the experiment involving music-elicited emotional modulation, the tools effectively identified improvements in negative emotions under true stimulation, with predictive accuracy significantly related to the average connectivity strength of the brain network. In the active facial emotion recognition modulation experiment, jointed stimulation of rVLPFC and temporo-parietal junction achieved better modulation effects. These findings highlight that the FusSADGCNN effectively evaluate the neuromodulation states during tES-induced emotional regulation, providing a reliable foundation for integrating emotion recognition and neuromodulation.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1087-1100"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144600248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cardiovascular disease (CVD) remains the leading cause of mortality worldwide, with coronary artery disease (CAD) being the most prevalent form. To improve screening efficiency, there is a critical need for accurate, non-invasive, and cost-effective CAD detection methods. This study presents Co-Attention Dual-Modal ViT (CAD-ViT), a novel classification framework based on the Vision Transformer that integrates both electrocardiogram (ECG) and phonocardiogram (PCG) signals. Unlike prior approaches that process ECG and PCG features independently or fuse them through simple concatenation, the proposed model introduces two key modules: a Co-Attention mechanism that enables bidirectional cross-modal interaction to effectively capture complementary features between ECG and PCG signals, and a Dynamic Weighted Fusion (DWF) module that adaptively adjusts the contribution of each modality for robust feature fusion. CAD-ViT is evaluated on a private clinical dataset comprising 132 CAD and 101 non-CAD subjects, achieving an accuracy of 97.08%, precision of 97.18%, specificity of 98.52%, F1-score of 97.04, and recall of 96.94%. Additional validation on two public datasets confirms the model's robustness and generalization capability. These results demonstrate the effectiveness of the proposed approach and its potential for practical deployment in CAD screening using multimodal biosignals.
{"title":"Integrating ECG and PCG Signals through a Dual-Modal ViT for Coronary Artery Disease Detection.","authors":"Xu Liu, Ling You, Chengcong Lv, Mingyuan Chen, Lianhuan Wei, Yineng Zheng, Xingming Guo","doi":"10.1109/JBHI.2025.3589257","DOIUrl":"10.1109/JBHI.2025.3589257","url":null,"abstract":"<p><p>Cardiovascular disease (CVD) remains the leading cause of mortality worldwide, with coronary artery disease (CAD) being the most prevalent form. To improve screening efficiency, there is a critical need for accurate, non-invasive, and cost-effective CAD detection methods. This study presents Co-Attention Dual-Modal ViT (CAD-ViT), a novel classification framework based on the Vision Transformer that integrates both electrocardiogram (ECG) and phonocardiogram (PCG) signals. Unlike prior approaches that process ECG and PCG features independently or fuse them through simple concatenation, the proposed model introduces two key modules: a Co-Attention mechanism that enables bidirectional cross-modal interaction to effectively capture complementary features between ECG and PCG signals, and a Dynamic Weighted Fusion (DWF) module that adaptively adjusts the contribution of each modality for robust feature fusion. CAD-ViT is evaluated on a private clinical dataset comprising 132 CAD and 101 non-CAD subjects, achieving an accuracy of 97.08%, precision of 97.18%, specificity of 98.52%, F1-score of 97.04, and recall of 96.94%. Additional validation on two public datasets confirms the model's robustness and generalization capability. These results demonstrate the effectiveness of the proposed approach and its potential for practical deployment in CAD screening using multimodal biosignals.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1128-1139"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144642431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3573954
Chengzhe Piao, Taiyu Zhu, Yu Wang, Stephanie E Baldeweg, Paul Taylor, Pantelis Georgiou, Jiahao Sun, Jun Wang, Kezhi Li
Newly diagnosed Type 1 Diabetes (T1D) patients often struggle to obtain effective Blood Glucose (BG) prediction models due to the lack of sufficient BG data from Continuous Glucose Monitoring (CGM), presenting a significant "cold start" problem in patient care. Utilizing population models to address this challenge is a potential solution, but collecting patient data for training population models in a privacy-conscious manner is challenging, especially given that such data is often stored on personal devices. Considering the privacy protection and addressing the "cold start" problem in diabetes care, we propose "GluADFL", blood Glucose prediction by Asynchronous Decentralized Federated Learning. We compared GluADFL with eight baseline methods using four distinct T1D datasets, comprising 298 participants, which demonstrated its superior performance in accurately predicting BG levels for cross-patient analysis. Furthermore, patients' data might be stored and shared across various communication networks in GluADFL, ranging from highly interconnected (e.g., random, performs the best among others) to more structured topologies (e.g., cluster and ring), suitable for various social networks. The asynchronous training framework supports flexible participation. By adjusting the ratios of inactive participants, we found it remains stable if less than 70% are inactive. Our results confirm that GluADFL offers a practical, privacy-preserved solution for BG prediction in T1D, significantly enhancing the quality of diabetes management.
{"title":"Privacy Preserved Blood Glucose Level Cross-Prediction: An Asynchronous Decentralized Federated Learning Approach.","authors":"Chengzhe Piao, Taiyu Zhu, Yu Wang, Stephanie E Baldeweg, Paul Taylor, Pantelis Georgiou, Jiahao Sun, Jun Wang, Kezhi Li","doi":"10.1109/JBHI.2025.3573954","DOIUrl":"10.1109/JBHI.2025.3573954","url":null,"abstract":"<p><p>Newly diagnosed Type 1 Diabetes (T1D) patients often struggle to obtain effective Blood Glucose (BG) prediction models due to the lack of sufficient BG data from Continuous Glucose Monitoring (CGM), presenting a significant \"cold start\" problem in patient care. Utilizing population models to address this challenge is a potential solution, but collecting patient data for training population models in a privacy-conscious manner is challenging, especially given that such data is often stored on personal devices. Considering the privacy protection and addressing the \"cold start\" problem in diabetes care, we propose \"GluADFL\", blood Glucose prediction by Asynchronous Decentralized Federated Learning. We compared GluADFL with eight baseline methods using four distinct T1D datasets, comprising 298 participants, which demonstrated its superior performance in accurately predicting BG levels for cross-patient analysis. Furthermore, patients' data might be stored and shared across various communication networks in GluADFL, ranging from highly interconnected (e.g., random, performs the best among others) to more structured topologies (e.g., cluster and ring), suitable for various social networks. The asynchronous training framework supports flexible participation. By adjusting the ratios of inactive participants, we found it remains stable if less than 70% are inactive. Our results confirm that GluADFL offers a practical, privacy-preserved solution for BG prediction in T1D, significantly enhancing the quality of diabetes management.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"839-852"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144855062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3632356
Yongxu Zhao, Kequan Yang, Yuanchen Wu, Xiaoqiang Li
Radiology report generation aims to automatically produce diagnostic reports from medical images, reducing radiologists' workload. Most existing models commonly use an encoder-decoder architecture, where the text decoder generates reports based on encoded image tokens. However, these approaches have two major limitations: 1) they always use a single-view feature or simple static fusion multi-view feature, which fails to capture complementary information from multi-view images, and 2) they lack explicit diagnostic information related to the disease during the text decoding process, resulting in reduced clinical accuracy and relevance of the generated report. To deal with the above limitations, this paper proposes a novel framework employing Multi-view Feature Integration and Enhanced Disease Prompting for Radiology Report Generation, called MFDP. Specifically, MFDP introduces two key innovations:1) the Multi-view Feature Fusion (MFF) module is designed to dynamically integrate multi-view images (e.g., frontal and lateral views) through a multi-view attention mechanism that adaptively captures inter-view dependencies, enriching the decoder's input features to generate more comprehensive reports. 2) the Enhanced Disease Prompting (EDP) module is designed to provide explicit diagnostic information by constructing enhanced disease prompts to guide the text decoding process. Experiments on two benchmark datasets, MIMIC-CXR and IU X-Ray, demonstrate that the proposed MFDP is competitive in both Clinical Efficacy (CE) and Natural Language Generation (NLG) metrics. Notably, MFDP achieves a 10% average improvement in CE Recall compared to SOTA models, enabling more precise localization of critical abnormalities while maintaining diagnostic completeness.
{"title":"MFDP: Multi-View Feature Integration and Enhanced Disease Prompting for Radiology Report Generation.","authors":"Yongxu Zhao, Kequan Yang, Yuanchen Wu, Xiaoqiang Li","doi":"10.1109/JBHI.2025.3632356","DOIUrl":"10.1109/JBHI.2025.3632356","url":null,"abstract":"<p><p>Radiology report generation aims to automatically produce diagnostic reports from medical images, reducing radiologists' workload. Most existing models commonly use an encoder-decoder architecture, where the text decoder generates reports based on encoded image tokens. However, these approaches have two major limitations: 1) they always use a single-view feature or simple static fusion multi-view feature, which fails to capture complementary information from multi-view images, and 2) they lack explicit diagnostic information related to the disease during the text decoding process, resulting in reduced clinical accuracy and relevance of the generated report. To deal with the above limitations, this paper proposes a novel framework employing Multi-view Feature Integration and Enhanced Disease Prompting for Radiology Report Generation, called MFDP. Specifically, MFDP introduces two key innovations:1) the Multi-view Feature Fusion (MFF) module is designed to dynamically integrate multi-view images (e.g., frontal and lateral views) through a multi-view attention mechanism that adaptively captures inter-view dependencies, enriching the decoder's input features to generate more comprehensive reports. 2) the Enhanced Disease Prompting (EDP) module is designed to provide explicit diagnostic information by constructing enhanced disease prompts to guide the text decoding process. Experiments on two benchmark datasets, MIMIC-CXR and IU X-Ray, demonstrate that the proposed MFDP is competitive in both Clinical Efficacy (CE) and Natural Language Generation (NLG) metrics. Notably, MFDP achieves a 10% average improvement in CE Recall compared to SOTA models, enabling more precise localization of critical abnormalities while maintaining diagnostic completeness.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1378-1391"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145512608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/JBHI.2025.3591844
Yukai Huang, Ningbo Zhao, Dongmin Huang, Yonglong Ye, Zi Luo, Hongzhou Lu, Min He, Wenjin Wang
The diagnosis of peripheral artery disease (PAD) typically relies on specialized equipment such as ultrasound. The delayed PAD detection of these approaches may lead to amputation and even death. To achieve rapid and ubiquitous PAD screening, we propose a novel concept of camera-based plantar perfusion imaging (CPPI) for PAD diagnosis and severity classification. Specifically, we performed a simulation trial that used an RGB camera to record the plantar video of 20 subjects and a cuff with different pressures applied to the left leg to simulate different degrees of lower limb blockage. We generated the plantar perfusion maps using remote photoplethysmography imaging and proposed a multi-view perfusion (MVP) feature set to represent the perfusion maps for PAD classification. The experimental results show that the Pearson correlation coefficients between MVP and Doppler ultrasound (clinical reference) features were larger than 0.9. MVP feature combined with Support Vector Machine obtains 91.47% accuracy in distinguishing the normal and obstructed states, and 76.48% accuracy in differentiating four different degrees of vascular obstruction. The clinical benchmark demonstrated the potential of CPPI as a rapid, sensitive, and easy-to-use diagnostic tool for PAD, suitable for large-scale screening in home or community settings.
{"title":"Plantar Perfusion Imaging for Peripheral Arterial Disease Screening: A Proof-of-Concept Study.","authors":"Yukai Huang, Ningbo Zhao, Dongmin Huang, Yonglong Ye, Zi Luo, Hongzhou Lu, Min He, Wenjin Wang","doi":"10.1109/JBHI.2025.3591844","DOIUrl":"10.1109/JBHI.2025.3591844","url":null,"abstract":"<p><p>The diagnosis of peripheral artery disease (PAD) typically relies on specialized equipment such as ultrasound. The delayed PAD detection of these approaches may lead to amputation and even death. To achieve rapid and ubiquitous PAD screening, we propose a novel concept of camera-based plantar perfusion imaging (CPPI) for PAD diagnosis and severity classification. Specifically, we performed a simulation trial that used an RGB camera to record the plantar video of 20 subjects and a cuff with different pressures applied to the left leg to simulate different degrees of lower limb blockage. We generated the plantar perfusion maps using remote photoplethysmography imaging and proposed a multi-view perfusion (MVP) feature set to represent the perfusion maps for PAD classification. The experimental results show that the Pearson correlation coefficients between MVP and Doppler ultrasound (clinical reference) features were larger than 0.9. MVP feature combined with Support Vector Machine obtains 91.47% accuracy in distinguishing the normal and obstructed states, and 76.48% accuracy in differentiating four different degrees of vascular obstruction. The clinical benchmark demonstrated the potential of CPPI as a rapid, sensitive, and easy-to-use diagnostic tool for PAD, suitable for large-scale screening in home or community settings.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"1520-1533"},"PeriodicalIF":6.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144698428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}