Pub Date : 2025-06-18eCollection Date: 2025-09-01DOI: 10.1007/s41666-025-00200-0
Aleksandr Ometov, Anzhelika Mezina, Hsiao-Chun Lin, Otso Arponen, Radim Burget, Jari Nurmi
Remote continuous patient monitoring is an essential feature of eHealth systems, offering opportunities for personalized care. Among its emerging applications, emotion and stress recognition hold significant promise, but face major challenges due to the subjective nature of emotions and the complexity of collecting and interpreting related data. This paper presents a review of open access multimodal datasets used in emotion and stress detection. It focuses on dataset characteristics, acquisition methods, and classification challenges, with attention to physiological signals captured by wearable devices, as well as advanced processing methods of these data. The findings show notable advances in data collection and algorithm development, but limitations remain, e.g., variability in real-world conditions, individual differences in emotional responses, and difficulties in objectively validating emotional states. The inclusion of self-reported and contextual data can enhance model performance, yet lacks consistency and reliability. Further barriers include privacy concerns, annotation of long-term data, and ensuring robustness in uncontrolled environments. By analyzing the current landscape and highlighting key gaps, this study contributes a foundation for future work in emotion recognition. Progress in the field will require privacy-preserving data strategies and interdisciplinary collaboration to develop reliable, scalable systems. These advances can enable broader adoption of emotion-aware technologies in eHealth and beyond.
{"title":"Stress and Emotion Open Access Data: A Review on Datasets, Modalities, Methods, Challenges, and Future Research Perspectives.","authors":"Aleksandr Ometov, Anzhelika Mezina, Hsiao-Chun Lin, Otso Arponen, Radim Burget, Jari Nurmi","doi":"10.1007/s41666-025-00200-0","DOIUrl":"10.1007/s41666-025-00200-0","url":null,"abstract":"<p><p>Remote continuous patient monitoring is an essential feature of eHealth systems, offering opportunities for personalized care. Among its emerging applications, emotion and stress recognition hold significant promise, but face major challenges due to the subjective nature of emotions and the complexity of collecting and interpreting related data. This paper presents a review of open access multimodal datasets used in emotion and stress detection. It focuses on dataset characteristics, acquisition methods, and classification challenges, with attention to physiological signals captured by wearable devices, as well as advanced processing methods of these data. The findings show notable advances in data collection and algorithm development, but limitations remain, e.g., variability in real-world conditions, individual differences in emotional responses, and difficulties in objectively validating emotional states. The inclusion of self-reported and contextual data can enhance model performance, yet lacks consistency and reliability. Further barriers include privacy concerns, annotation of long-term data, and ensuring robustness in uncontrolled environments. By analyzing the current landscape and highlighting key gaps, this study contributes a foundation for future work in emotion recognition. Progress in the field will require privacy-preserving data strategies and interdisciplinary collaboration to develop reliable, scalable systems. These advances can enable broader adoption of emotion-aware technologies in eHealth and beyond.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"9 3","pages":"247-279"},"PeriodicalIF":3.7,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12290141/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144736502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14eCollection Date: 2025-03-01DOI: 10.1007/s41666-024-00180-7
Matthew Pears, Karan Wadhwa, Stephen R Payne, Vishwanath Hanchanale, Mamoun Hamid Elmamoun, Sunjay Jain, Stathis Th Konstantinidis, Mark Rochester, Ruth Doherty, Kenneth Spearpoint, Oliver Ng, Lachlan Dick, Steven Yule, Chandra Shekhar Biyani
Non-technical skills (NTS) are crucial in healthcare, encompassing cognitive and social skills that support technical ability. Traditional NTS training is evolving with the emergence of artificial intelligence (AI) models that can intelligently converse with their users, known as large language models (LLMs). This study investigated the capabilities and limitations of a popular model named generative pre-trained transformer 4 (GPT-4) in NTS training, comparing its performance to that of human evaluators. Urology trainees identified NTS events in simulated scenarios and discussed them in blinded feedback sessions with AI and human consultants. Experts assessed the blinded interaction data, providing quantitative ratings and qualitative evaluations using annotated transcripts. Wilcoxon signed-rank tests compared pre- and post-intervention ratings, whilst Mann-Whitney U tests compared post-intervention ratings between AI and human feedback. Thematic analysis identified strengths, limitations, and differences between AI and human feedback approaches. The AI model demonstrated significant strengths in reinforcing knowledge gathering (p = 0.04), providing accurate and evidence-based feedback (p = 0.013), conveying empathy (p = 0.021), and tailoring explanations to complexity (p = 0.002). However, human feedback excelled in language terminology (p = 0.003), complexity (p = 0.020), and fact-based feedback (p = 0.025). The study highlights the potential for AI to augment assessment of NTS training in healthcare. A blended approach utilising AI and human expertise may boost training efficacy.
{"title":"Non-technical Skills for Urology Trainees: A Double-Blinded Study of ChatGPT4 AI Benchmarking Against Consultant Interaction.","authors":"Matthew Pears, Karan Wadhwa, Stephen R Payne, Vishwanath Hanchanale, Mamoun Hamid Elmamoun, Sunjay Jain, Stathis Th Konstantinidis, Mark Rochester, Ruth Doherty, Kenneth Spearpoint, Oliver Ng, Lachlan Dick, Steven Yule, Chandra Shekhar Biyani","doi":"10.1007/s41666-024-00180-7","DOIUrl":"10.1007/s41666-024-00180-7","url":null,"abstract":"<p><p>Non-technical skills (NTS) are crucial in healthcare, encompassing cognitive and social skills that support technical ability. Traditional NTS training is evolving with the emergence of artificial intelligence (AI) models that can intelligently converse with their users, known as large language models (LLMs). This study investigated the capabilities and limitations of a popular model named generative pre-trained transformer 4 (GPT-4) in NTS training, comparing its performance to that of human evaluators. Urology trainees identified NTS events in simulated scenarios and discussed them in blinded feedback sessions with AI and human consultants. Experts assessed the blinded interaction data, providing quantitative ratings and qualitative evaluations using annotated transcripts. Wilcoxon signed-rank tests compared pre- and post-intervention ratings, whilst Mann-Whitney <i>U</i> tests compared post-intervention ratings between AI and human feedback. Thematic analysis identified strengths, limitations, and differences between AI and human feedback approaches. The AI model demonstrated significant strengths in reinforcing knowledge gathering (<i>p</i> = 0.04), providing accurate and evidence-based feedback (<i>p</i> = 0.013), conveying empathy (<i>p</i> = 0.021), and tailoring explanations to complexity (<i>p</i> = 0.002). However, human feedback excelled in language terminology (<i>p</i> = 0.003), complexity (<i>p</i> = 0.020), and fact-based feedback (<i>p</i> = 0.025). The study highlights the potential for AI to augment assessment of NTS training in healthcare. A blended approach utilising AI and human expertise may boost training efficacy.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"9 1","pages":"103-118"},"PeriodicalIF":5.4,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11782744/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large language models (LLMs) have rapidly become important tools in Biomedical and Health Informatics (BHI), potentially enabling new ways to analyze data, treat patients, and conduct research. This study aims to provide a comprehensive overview of LLM applications in BHI, highlighting their transformative potential and addressing the associated ethical and practical challenges. We reviewed 1698 research articles from January 2022 to December 2023, categorizing them by research themes and diagnostic categories. Additionally, we conducted network analysis to map scholarly collaborations and research dynamics. Our findings reveal a substantial increase in the potential applications of LLMs to a variety of BHI tasks, including clinical decision support, patient interaction, and medical document analysis. Notably, LLMs are expected to be instrumental in enhancing the accuracy of diagnostic tools and patient care protocols. The network analysis highlights dense and dynamically evolving collaborations across institutions, underscoring the interdisciplinary nature of LLM research in BHI. A significant trend was the application of LLMs in managing specific disease categories, such as mental health and neurological disorders, demonstrating their potential to influence personalized medicine and public health strategies. LLMs hold promising potential to further transform biomedical research and healthcare delivery. While promising, the ethical implications and challenges of model validation call for rigorous scrutiny to optimize their benefits in clinical settings. This survey serves as a resource for stakeholders in healthcare, including researchers, clinicians, and policymakers, to understand the current state and future potential of LLMs in BHI.
{"title":"Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis.","authors":"Huizi Yu, Lizhou Fan, Lingyao Li, Jiayan Zhou, Zihui Ma, Lu Xian, Wenyue Hua, Sijia He, Mingyu Jin, Yongfeng Zhang, Ashvin Gandhi, Xin Ma","doi":"10.1007/s41666-024-00171-8","DOIUrl":"10.1007/s41666-024-00171-8","url":null,"abstract":"<p><p>Large language models (LLMs) have rapidly become important tools in Biomedical and Health Informatics (BHI), potentially enabling new ways to analyze data, treat patients, and conduct research. This study aims to provide a comprehensive overview of LLM applications in BHI, highlighting their transformative potential and addressing the associated ethical and practical challenges. We reviewed 1698 research articles from January 2022 to December 2023, categorizing them by research themes and diagnostic categories. Additionally, we conducted network analysis to map scholarly collaborations and research dynamics. Our findings reveal a substantial increase in the potential applications of LLMs to a variety of BHI tasks, including clinical decision support, patient interaction, and medical document analysis. Notably, LLMs are expected to be instrumental in enhancing the accuracy of diagnostic tools and patient care protocols. The network analysis highlights dense and dynamically evolving collaborations across institutions, underscoring the interdisciplinary nature of LLM research in BHI. A significant trend was the application of LLMs in managing specific disease categories, such as mental health and neurological disorders, demonstrating their potential to influence personalized medicine and public health strategies. LLMs hold promising potential to further transform biomedical research and healthcare delivery. While promising, the ethical implications and challenges of model validation call for rigorous scrutiny to optimize their benefits in clinical settings. This survey serves as a resource for stakeholders in healthcare, including researchers, clinicians, and policymakers, to understand the current state and future potential of LLMs in BHI.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"8 4","pages":"658-711"},"PeriodicalIF":3.7,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499577/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142515866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01eCollection Date: 2024-09-01DOI: 10.1007/s41666-024-00169-2
Soheila Molaei, Nima Ghanbari Bousejin, Ghadeer O Ghosheh, Anshul Thakur, Vinod Kumar Chauhan, Tingting Zhu, David A Clifton
Electronic Health Records (EHRs) play a crucial role in shaping predictive are models, yet they encounter challenges such as significant data gaps and class imbalances. Traditional Graph Neural Network (GNN) approaches have limitations in fully leveraging neighbourhood data or demanding intensive computational requirements for regularisation. To address this challenge, we introduce CliqueFluxNet, a novel framework that innovatively constructs a patient similarity graph to maximise cliques, thereby highlighting strong inter-patient connections. At the heart of CliqueFluxNet lies its stochastic edge fluxing strategy - a dynamic process involving random edge addition and removal during training. This strategy aims to enhance the model's generalisability and mitigate overfitting. Our empirical analysis, conducted on MIMIC-III and eICU datasets, focuses on the tasks of mortality and readmission prediction. It demonstrates significant progress in representation learning, particularly in scenarios with limited data availability. Qualitative assessments further underscore CliqueFluxNet's effectiveness in extracting meaningful EHR representations, solidifying its potential for advancing GNN applications in healthcare analytics.
{"title":"CliqueFluxNet: Unveiling EHR Insights with Stochastic Edge Fluxing and Maximal Clique Utilisation Using Graph Neural Networks.","authors":"Soheila Molaei, Nima Ghanbari Bousejin, Ghadeer O Ghosheh, Anshul Thakur, Vinod Kumar Chauhan, Tingting Zhu, David A Clifton","doi":"10.1007/s41666-024-00169-2","DOIUrl":"10.1007/s41666-024-00169-2","url":null,"abstract":"<p><p>Electronic Health Records (EHRs) play a crucial role in shaping predictive are models, yet they encounter challenges such as significant data gaps and class imbalances. Traditional Graph Neural Network (GNN) approaches have limitations in fully leveraging neighbourhood data or demanding intensive computational requirements for regularisation. To address this challenge, we introduce CliqueFluxNet, a novel framework that innovatively constructs a patient similarity graph to maximise cliques, thereby highlighting strong inter-patient connections. At the heart of CliqueFluxNet lies its stochastic edge fluxing strategy - a dynamic process involving random edge addition and removal during training. This strategy aims to enhance the model's generalisability and mitigate overfitting. Our empirical analysis, conducted on MIMIC-III and eICU datasets, focuses on the tasks of mortality and readmission prediction. It demonstrates significant progress in representation learning, particularly in scenarios with limited data availability. Qualitative assessments further underscore CliqueFluxNet's effectiveness in extracting meaningful EHR representations, solidifying its potential for advancing GNN applications in healthcare analytics.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"8 3","pages":"555-575"},"PeriodicalIF":5.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11310186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141918532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-15eCollection Date: 2024-09-01DOI: 10.1007/s41666-024-00168-3
Cuong V Nguyen, Hieu Minh Duong, Cuong D Do
In practical electrocardiography (ECG) interpretation, the scarcity of well-annotated data is a common challenge. Transfer learning techniques are valuable in such situations, yet the assessment of transferability has received limited attention. To tackle this issue, we introduce MELEP, which stands for Muti-label Expected Log of Empirical Predictions, a measure designed to estimate the effectiveness of knowledge transfer from a pre-trained model to a downstream multi-label ECG diagnosis task. MELEP is generic, working with new target data with different label sets, and computationally efficient, requiring only a single forward pass through the pre-trained model. To the best of our knowledge, MELEP is the first transferability metric specifically designed for multi-label ECG classification problems. Our experiments show that MELEP can predict the performance of pre-trained convolutional and recurrent deep neural networks, on small and imbalanced ECG data. Specifically, we observed strong correlation coefficients (with absolute values exceeding 0.6 in most cases) between MELEP and the actual average F1 scores of the fine-tuned models. Our work highlights the potential of MELEP to expedite the selection of suitable pre-trained models for ECG diagnosis tasks, saving time and effort that would otherwise be spent on fine-tuning these models.
在实际的心电图(ECG)解读中,缺乏注释清晰的数据是一个常见的挑战。迁移学习技术在这种情况下很有价值,但对可迁移性的评估却关注有限。为了解决这个问题,我们引入了 MELEP(Muti-label Expected Log of Empirical Predictions),这是一种用于评估从预训练模型到下游多标签心电图诊断任务的知识转移效果的方法。MELEP 具有通用性,可处理具有不同标签集的新目标数据,而且计算效率高,只需对预训练模型进行一次前向传递。据我们所知,MELEP 是第一个专门为多标签心电图分类问题设计的可转移性指标。我们的实验表明,MELEP 可以预测预先训练好的卷积和递归深度神经网络在少量不平衡心电图数据上的表现。具体来说,我们观察到 MELEP 与微调模型的实际平均 F1 分数之间具有很强的相关系数(在大多数情况下绝对值超过 0.6)。我们的工作凸显了 MELEP 在加快为心电图诊断任务选择合适的预训练模型方面的潜力,从而节省了用于微调这些模型的时间和精力。
{"title":"MELEP: A Novel Predictive Measure of Transferability in Multi-label ECG Diagnosis.","authors":"Cuong V Nguyen, Hieu Minh Duong, Cuong D Do","doi":"10.1007/s41666-024-00168-3","DOIUrl":"10.1007/s41666-024-00168-3","url":null,"abstract":"<p><p>In practical electrocardiography (ECG) interpretation, the scarcity of well-annotated data is a common challenge. Transfer learning techniques are valuable in such situations, yet the assessment of transferability has received limited attention. To tackle this issue, we introduce MELEP, which stands for <i>Muti-label Expected Log of Empirical Predictions</i>, a measure designed to estimate the effectiveness of knowledge transfer from a pre-trained model to a downstream multi-label ECG diagnosis task. MELEP is generic, working with new target data with different label sets, and computationally efficient, requiring only a single forward pass through the pre-trained model. To the best of our knowledge, MELEP is the first transferability metric specifically designed for multi-label ECG classification problems. Our experiments show that MELEP can predict the performance of pre-trained convolutional and recurrent deep neural networks, on small and imbalanced ECG data. Specifically, we observed strong correlation coefficients (with absolute values exceeding 0.6 in most cases) between MELEP and the actual average F1 scores of the fine-tuned models. Our work highlights the potential of MELEP to expedite the selection of suitable pre-trained models for ECG diagnosis tasks, saving time and effort that would otherwise be spent on fine-tuning these models.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"8 3","pages":"506-522"},"PeriodicalIF":3.7,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11310184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141918534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-28eCollection Date: 2024-09-01DOI: 10.1007/s41666-024-00167-4
Ruhan Liu, Jiajia Li, Yang Wen, Huating Li, Ping Zhang, Bin Sheng, David Dagan Feng
Understanding and addressing the dynamics of infectious diseases, such as coronavirus disease 2019, are essential for effectively managing the current situation and developing intervention strategies. Epidemiologists commonly use mathematical models, known as epidemiological equations (EE), to simulate disease spread. However, accurately estimating the parameters of these models can be challenging due to factors like variations in social distancing policies and intervention strategies. In this study, we propose a novel method called deep dynamic epidemiological modeling (DDE) to address these challenges. The DDE method combines the strengths of EE with the capabilities of deep neural networks to improve the accuracy of fitting real-world data. In DDE, we apply neural ordinary differential equations to solve variant-specific equations, ensuring a more precise fit for disease progression in different geographic regions. In the experiment, we tested the performance of the DDE method and other state-of-the-art methods using real-world data from five diverse geographic entities: the USA, Colombia, South Africa, Wuhan in China, and Piedmont in Italy. Compared to the state-of-the-art method, DDE significantly improved accuracy, with an average fitting Pearson coefficient exceeding 0.97 across the five geographic entities. In summary, the DDE method enhances the accuracy of parameter fitting in epidemiological models and provides a foundation for constructing simpler models adaptable to different geographic areas.
{"title":"DDE: Deep Dynamic Epidemiological Modeling for Infectious Illness Development Forecasting in Multi-level Geographic Entities.","authors":"Ruhan Liu, Jiajia Li, Yang Wen, Huating Li, Ping Zhang, Bin Sheng, David Dagan Feng","doi":"10.1007/s41666-024-00167-4","DOIUrl":"10.1007/s41666-024-00167-4","url":null,"abstract":"<p><p>Understanding and addressing the dynamics of infectious diseases, such as coronavirus disease 2019, are essential for effectively managing the current situation and developing intervention strategies. Epidemiologists commonly use mathematical models, known as epidemiological equations (EE), to simulate disease spread. However, accurately estimating the parameters of these models can be challenging due to factors like variations in social distancing policies and intervention strategies. In this study, we propose a novel method called deep dynamic epidemiological modeling (DDE) to address these challenges. The DDE method combines the strengths of EE with the capabilities of deep neural networks to improve the accuracy of fitting real-world data. In DDE, we apply neural ordinary differential equations to solve variant-specific equations, ensuring a more precise fit for disease progression in different geographic regions. In the experiment, we tested the performance of the DDE method and other state-of-the-art methods using real-world data from five diverse geographic entities: the USA, Colombia, South Africa, Wuhan in China, and Piedmont in Italy. Compared to the state-of-the-art method, DDE significantly improved accuracy, with an average fitting Pearson coefficient exceeding 0.97 across the five geographic entities. In summary, the DDE method enhances the accuracy of parameter fitting in epidemiological models and provides a foundation for constructing simpler models adaptable to different geographic areas.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"8 3","pages":"478-505"},"PeriodicalIF":3.7,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11310392/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141918533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-23eCollection Date: 2024-06-01DOI: 10.1007/s41666-024-00164-7
Tongnian Wang, Kai Zhang, Jiannan Cai, Yanmin Gong, Kim-Kwang Raymond Choo, Yuanxiong Guo
<p><p>As machine learning (ML) usage becomes more popular in the healthcare sector, there are also increasing concerns about potential biases and risks such as privacy. One countermeasure is to use federated learning (FL) to support collaborative learning without the need for patient data sharing across different organizations. However, the inherent heterogeneity of data distributions among participating FL parties poses challenges for exploring group fairness in FL. While personalization within FL can handle performance degradation caused by data heterogeneity, its influence on group fairness is not fully investigated. Therefore, the primary focus of this study is to rigorously assess the impact of personalized FL on group fairness in the healthcare domain, offering a comprehensive understanding of how personalized FL affects group fairness in clinical outcomes. We conduct an empirical analysis using two prominent real-world Electronic Health Records (EHR) datasets, namely eICU and MIMIC-IV. Our methodology involves a thorough comparison between personalized FL and two baselines: standalone training, where models are developed independently without FL collaboration, and standard FL, which aims to learn a global model via the FedAvg algorithm. We adopt Ditto as our personalized FL approach, which enables each client in FL to develop its own personalized model through multi-task learning. Our assessment is achieved through a series of evaluations, comparing the predictive performance (i.e., AUROC and AUPRC) and fairness gaps (i.e., EOPP, EOD, and DP) of these methods. Personalized FL demonstrates superior predictive accuracy and fairness over standalone training across both datasets. Nevertheless, in comparison with standard FL, personalized FL shows improved predictive accuracy but does not consistently offer better fairness outcomes. For instance, in the 24-h in-hospital mortality prediction task, personalized FL achieves an average EOD of 27.4% across racial groups in the eICU dataset and 47.8% in MIMIC-IV. In comparison, standard FL records a better EOD of 26.2% for eICU and 42.0% for MIMIC-IV, while standalone training yields significantly worse EOD of 69.4% and 54.7% on these datasets, respectively. Our analysis reveals that personalized FL has the potential to enhance fairness in comparison to standalone training, yet it does not consistently ensure fairness improvements compared to standard FL. Our findings also show that while personalization can improve fairness for more biased hospitals (i.e., hospitals having larger fairness gaps in standalone training), it can exacerbate fairness issues for less biased ones. These insights suggest that the integration of personalized FL with additional strategic designs could be key to simultaneously boosting prediction accuracy and reducing fairness disparities. The findings and opportunities outlined in this paper can inform the research agenda for future studies, to overcome the limitations and fur
{"title":"Analyzing the Impact of Personalization on Fairness in Federated Learning for Healthcare.","authors":"Tongnian Wang, Kai Zhang, Jiannan Cai, Yanmin Gong, Kim-Kwang Raymond Choo, Yuanxiong Guo","doi":"10.1007/s41666-024-00164-7","DOIUrl":"10.1007/s41666-024-00164-7","url":null,"abstract":"<p><p>As machine learning (ML) usage becomes more popular in the healthcare sector, there are also increasing concerns about potential biases and risks such as privacy. One countermeasure is to use federated learning (FL) to support collaborative learning without the need for patient data sharing across different organizations. However, the inherent heterogeneity of data distributions among participating FL parties poses challenges for exploring group fairness in FL. While personalization within FL can handle performance degradation caused by data heterogeneity, its influence on group fairness is not fully investigated. Therefore, the primary focus of this study is to rigorously assess the impact of personalized FL on group fairness in the healthcare domain, offering a comprehensive understanding of how personalized FL affects group fairness in clinical outcomes. We conduct an empirical analysis using two prominent real-world Electronic Health Records (EHR) datasets, namely eICU and MIMIC-IV. Our methodology involves a thorough comparison between personalized FL and two baselines: standalone training, where models are developed independently without FL collaboration, and standard FL, which aims to learn a global model via the FedAvg algorithm. We adopt Ditto as our personalized FL approach, which enables each client in FL to develop its own personalized model through multi-task learning. Our assessment is achieved through a series of evaluations, comparing the predictive performance (i.e., AUROC and AUPRC) and fairness gaps (i.e., EOPP, EOD, and DP) of these methods. Personalized FL demonstrates superior predictive accuracy and fairness over standalone training across both datasets. Nevertheless, in comparison with standard FL, personalized FL shows improved predictive accuracy but does not consistently offer better fairness outcomes. For instance, in the 24-h in-hospital mortality prediction task, personalized FL achieves an average EOD of 27.4% across racial groups in the eICU dataset and 47.8% in MIMIC-IV. In comparison, standard FL records a better EOD of 26.2% for eICU and 42.0% for MIMIC-IV, while standalone training yields significantly worse EOD of 69.4% and 54.7% on these datasets, respectively. Our analysis reveals that personalized FL has the potential to enhance fairness in comparison to standalone training, yet it does not consistently ensure fairness improvements compared to standard FL. Our findings also show that while personalization can improve fairness for more biased hospitals (i.e., hospitals having larger fairness gaps in standalone training), it can exacerbate fairness issues for less biased ones. These insights suggest that the integration of personalized FL with additional strategic designs could be key to simultaneously boosting prediction accuracy and reducing fairness disparities. The findings and opportunities outlined in this paper can inform the research agenda for future studies, to overcome the limitations and fur","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"8 2","pages":"181-205"},"PeriodicalIF":5.4,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11052754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140856928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-16eCollection Date: 2024-06-01DOI: 10.1007/s41666-024-00160-x
Zakary Georgis-Yap, Milos R Popovic, Shehroz S Khan
Epilepsy affects more than 50 million people worldwide, making it one of the world's most prevalent neurological diseases. The main symptom of epilepsy is seizures, which occur abruptly and can cause serious injury or death. The ability to predict the occurrence of an epileptic seizure could alleviate many risks and stresses people with epilepsy face. We formulate the problem of detecting preictal (or pre-seizure) with reference to normal EEG as a precursor to incoming seizure. To this end, we developed several supervised deep learning approaches model to identify preictal EEG from normal EEG. We further develop novel unsupervised deep learning approaches to train the models on only normal EEG, and detecting pre-seizure EEG as an anomalous event. These deep learning models were trained and evaluated on two large EEG seizure datasets in a person-specific manner. We found that both supervised and unsupervised approaches are feasible; however, their performance varies depending on the patient, approach and architecture. This new line of research has the potential to develop therapeutic interventions and save human lives.
{"title":"Supervised and Unsupervised Deep Learning Approaches for EEG Seizure Prediction.","authors":"Zakary Georgis-Yap, Milos R Popovic, Shehroz S Khan","doi":"10.1007/s41666-024-00160-x","DOIUrl":"10.1007/s41666-024-00160-x","url":null,"abstract":"<p><p>Epilepsy affects more than <b>50</b> million people worldwide, making it one of the world's most prevalent neurological diseases. The main symptom of epilepsy is seizures, which occur abruptly and can cause serious injury or death. The ability to predict the occurrence of an epileptic seizure could alleviate many risks and stresses people with epilepsy face. We formulate the problem of detecting preictal (or pre-seizure) with reference to normal EEG as a precursor to incoming seizure. To this end, we developed several supervised deep learning approaches model to identify preictal EEG from normal EEG. We further develop novel unsupervised deep learning approaches to train the models on only normal EEG, and detecting pre-seizure EEG as an anomalous event. These deep learning models were trained and evaluated on two large EEG seizure datasets in a person-specific manner. We found that both supervised and unsupervised approaches are feasible; however, their performance varies depending on the patient, approach and architecture. This new line of research has the potential to develop therapeutic interventions and save human lives.</p>","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"8 2","pages":"286-312"},"PeriodicalIF":5.4,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11052752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140875121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-05DOI: 10.1007/s41666-023-00155-0
Cathy Shyr, Yan Hu, L. Bastarache, Alex Cheng, Rizwan Hamid, Paul Harris, Hua Xu
{"title":"Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models","authors":"Cathy Shyr, Yan Hu, L. Bastarache, Alex Cheng, Rizwan Hamid, Paul Harris, Hua Xu","doi":"10.1007/s41666-023-00155-0","DOIUrl":"https://doi.org/10.1007/s41666-023-00155-0","url":null,"abstract":"","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"54 2","pages":"1-24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139381790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-03DOI: 10.1007/s41666-023-00157-y
Dinithi Vithanage, Ping Yu, Lei Wang, Chao Deng
{"title":"Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study","authors":"Dinithi Vithanage, Ping Yu, Lei Wang, Chao Deng","doi":"10.1007/s41666-023-00157-y","DOIUrl":"https://doi.org/10.1007/s41666-023-00157-y","url":null,"abstract":"","PeriodicalId":101413,"journal":{"name":"Journal of healthcare informatics research","volume":"22 7","pages":"1-22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139389543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}