The integration of large language models (LLMs) into healthcare settings holds great promise for improving clinical workflow efficiency and enhancing patient care, with the potential to automate tasks such as text summarisation during consultations. The fidelity between LLM outputs and ground truth information is therefore paramount in healthcare, as errors in medical summary generation can lead to miscommunication between patients and clinicians, leading to incorrect diagnosis and treatment decisions and compromising patient safety. LLMs are well-known to produce a variety of errors. Currently, there is no established clinical framework for assessing the safety and accuracy of LLM-generated medical text. We have developed a new approach to: a) categorise LLM errors within the clinical documentation context, b) establish clinical safety metrics for the live usage phase, and c) suggest a framework named CREOLA for assessing the safety risk for errors. We present clinical error metrics over 18 different LLM experimental configurations for the clinical note generation task, consisting of 12,999 clinician-annotated sentences. We illustrate the utility of using our platform CREOLA for iteration over LLM architectures with two experiments. Overall, we find our best-performing experiments outperform previously reported model error rates in the note generation literature, and additionally outperform human annotators. Our suggested framework can be used to assess the accuracy and safety of LLM output in the clinical context.
{"title":"A Framework to Assess Clinical Safety and Hallucination Rates of LLMs for Medical Text Summarisation","authors":"Elham Asgari, Nina Montana-Brown, Magda Dubois, Saleh Khalil, Jasmine Balloch, Dominic Pimenta","doi":"10.1101/2024.09.12.24313556","DOIUrl":"https://doi.org/10.1101/2024.09.12.24313556","url":null,"abstract":"The integration of large language models (LLMs) into healthcare settings holds great promise for improving clinical workflow efficiency and enhancing patient care, with the potential to automate tasks such as text summarisation during consultations. The fidelity between LLM outputs and ground truth information is therefore paramount in healthcare, as errors in medical summary generation can lead to miscommunication between patients and clinicians, leading to incorrect diagnosis and treatment decisions and compromising patient safety. LLMs are well-known to produce a variety of errors. Currently, there is no established clinical framework for assessing the safety and accuracy of LLM-generated medical text.\u0000We have developed a new approach to: a) categorise LLM errors within the clinical documentation context, b) establish clinical safety metrics for the live usage phase, and c) suggest a framework named CREOLA for assessing the safety risk for errors. We present clinical error metrics over 18 different LLM experimental configurations for the clinical note generation task, consisting of 12,999 clinician-annotated sentences. We illustrate the utility of using our platform CREOLA for iteration over LLM architectures with two experiments. Overall, we find our best-performing experiments outperform previously reported model error rates in the note generation literature, and additionally outperform human annotators. Our suggested framework can be used to assess the accuracy and safety of LLM output in the clinical context.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1101/2024.09.12.24313594
Xinmeng Zhang, Chao Yan, Yuyang Yang, Zhuohang Li, Yubo Feng, Bradley A. Malin, You Chen
Electronic Health Record (EHR) audit log data are increasingly utilized for clinical tasks, from workflow modeling to predictive analyses of discharge events, adverse kidney outcomes, and hospital readmissions. These data encapsulate user-EHR interactions, reflecting both healthcare professionals' behavior and patients' health statuses. To harness this temporal information effectively, this study explores the application of Large Language Models (LLMs) in leveraging audit log data for clinical prediction tasks, specifically focusing on discharge predictions. Utilizing a year's worth of EHR data from Vanderbilt University Medical Center, we fine-tuned LLMs with randomly selected 10,000 training examples. Our findings reveal that LLaMA-2 70B, with an AUROC of 0.80 [0.77-0.82], outperforms both GPT-4 128K in a zero-shot, with an AUROC of 0.68 [0.65-0.71], and DeBERTa, with an AUROC of 0.78 [0.75-0.82]. Among various serialization methods, the first-occurrence approach — wherein only the initial appearance of each event in a sequence is retained — shows superior performance. Furthermore, for the fine-tuned LLaMA-2 70B, logit outputs yield a higher AUROC of 0.80 [0.77-0.82] compared to text outputs, with an AUROC of 0.69 [0.67-0.72]. This study underscores the potential of fine-tuned LLMs, particularly when combined with strategic sequence serialization, in advancing clinical prediction tasks.
{"title":"Optimizing Large Language Models for Discharge Prediction: Best Practices in Leveraging Electronic Health Record Audit Logs","authors":"Xinmeng Zhang, Chao Yan, Yuyang Yang, Zhuohang Li, Yubo Feng, Bradley A. Malin, You Chen","doi":"10.1101/2024.09.12.24313594","DOIUrl":"https://doi.org/10.1101/2024.09.12.24313594","url":null,"abstract":"Electronic Health Record (EHR) audit log data are increasingly utilized for clinical tasks, from workflow modeling to predictive analyses of discharge events, adverse kidney outcomes, and hospital readmissions. These data encapsulate user-EHR interactions, reflecting both healthcare professionals' behavior and patients' health statuses. To harness this temporal information effectively, this study explores the application of Large Language Models (LLMs) in leveraging audit log data for clinical prediction tasks, specifically focusing on discharge predictions. Utilizing a year's worth of EHR data from Vanderbilt University Medical Center, we fine-tuned LLMs with randomly selected 10,000 training examples. Our findings reveal that LLaMA-2 70B, with an AUROC of 0.80 [0.77-0.82], outperforms both GPT-4 128K in a zero-shot, with an AUROC of 0.68 [0.65-0.71], and DeBERTa, with an AUROC of 0.78 [0.75-0.82]. Among various serialization methods, the first-occurrence approach — wherein only the initial appearance of each event in a sequence is retained — shows superior performance. Furthermore, for the fine-tuned LLaMA-2 70B, logit outputs yield a higher AUROC of 0.80 [0.77-0.82] compared to text outputs, with an AUROC of 0.69 [0.67-0.72]. This study underscores the potential of fine-tuned LLMs, particularly when combined with strategic sequence serialization, in advancing clinical prediction tasks.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1101/2024.09.11.24313513
Yu Hou, Rui Zhang
Objective: To enhance the accuracy and reliability of dietary supplement (DS) question answering by integrating a novel Retrieval-Augmented Generation (RAG) LLM system with an updated and integrated DS knowledge base and providing a user-friendly interface. With. Materials and Methods: We developed iDISK2.0 by integrating updated data from multiple trusted sources, including NMCD, MSKCC, DSLD, and NHPD, and applied advanced integration strategies to reduce noise. We then applied the iDISK2.0 with a RAG system, leveraging the strengths of large language models (LLMs) and a biomedical knowledge graph (BKG) to address the hallucination issues inherent in standalone LLMs. The system enhances answer generation by using LLMs (GPT-4.0) to retrieve contextually relevant subgraphs from the BKG based on identified entities in the query. A user-friendly interface was built to facilitate easy access to DS knowledge through conversational text inputs. Results: The iDISK2.0 encompasses 174,317 entities across seven types, six types of relationships, and 471,063 attributes. The iDISK2.0-RAG system significantly improved the accuracy of DS-related information retrieval. Our evaluations showed that the system achieved over 95% accuracy in answering True/False and multiple-choice questions, outperforming standalone LLMs. Additionally, the user-friendly interface enabled efficient interaction, allowing users to input free-form text queries and receive accurate, contextually relevant responses. The integration process minimized data noise and ensured the most up-to-date and comprehensive DS information was available to users. Conclusion: The integration of iDISK2.0 with an RAG system effectively addresses the limitations of LLMs, providing a robust solution for accurate DS information retrieval. This study underscores the importance of combining structured knowledge graphs with advanced language models to enhance the precision and reliability of information retrieval systems, ultimately supporting better-informed decisions in DS-related research and healthcare.
{"title":"Enhancing Dietary Supplement Question Answer via Retrieval-Augmented Generation (RAG) with LLM","authors":"Yu Hou, Rui Zhang","doi":"10.1101/2024.09.11.24313513","DOIUrl":"https://doi.org/10.1101/2024.09.11.24313513","url":null,"abstract":"Objective: To enhance the accuracy and reliability of dietary supplement (DS) question answering by integrating a novel Retrieval-Augmented Generation (RAG) LLM system with an updated and integrated DS knowledge base and providing a user-friendly interface. With.\u0000Materials and Methods: We developed iDISK2.0 by integrating updated data from multiple trusted sources, including NMCD, MSKCC, DSLD, and NHPD, and applied advanced integration strategies to reduce noise. We then applied the iDISK2.0 with a RAG system, leveraging the strengths of large language models (LLMs) and a biomedical knowledge graph (BKG) to address the hallucination issues inherent in standalone LLMs. The system enhances answer generation by using LLMs (GPT-4.0) to retrieve contextually relevant subgraphs from the BKG based on identified entities in the query. A user-friendly interface was built to facilitate easy access to DS knowledge through conversational text inputs.\u0000Results: The iDISK2.0 encompasses 174,317 entities across seven types, six types of relationships, and 471,063 attributes. The iDISK2.0-RAG system significantly improved the accuracy of DS-related information retrieval. Our evaluations showed that the system achieved over 95% accuracy in answering True/False and multiple-choice questions, outperforming standalone LLMs. Additionally, the user-friendly interface enabled efficient interaction, allowing users to input free-form text queries and receive accurate, contextually relevant responses. The integration process minimized data noise and ensured the most up-to-date and comprehensive DS information was available to users.\u0000Conclusion: The integration of iDISK2.0 with an RAG system effectively addresses the limitations of LLMs, providing a robust solution for accurate DS information retrieval. This study underscores the importance of combining structured knowledge graphs with advanced language models to enhance the precision and reliability of information retrieval systems, ultimately supporting better-informed decisions in DS-related research and healthcare.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1101/2024.09.09.24312921
Jack Manners, Eva Kemps, Bastien Lechat, Peter Catcheside, Danny Eckert, Hannah Scott
Consumer sleep trackers provide useful insight into sleep. However, large scale performance evaluation studies are needed to properly understand sleep tracker accuracy. This study evaluated performance of an under-mattress sensor to estimate sleep and wake versus polysomnography in a large sample, including individuals with and without sleep disorders and during day versus night sleep opportunities, across multiple in-laboratory studies. 183 participants (51/49% male/female, mean[SD] age=45[18] years) attended the sleep laboratory for a research study including simultaneous polysomnography and under-mattress sensor (Withings Sleep Analyzer [WSA]) recordings. Epoch-by-epoch analyses determined accuracy, sensitivity, and specificity of the WSA versus polysomnography. Bland-Altman plots examined bias in sleep duration, efficiency, onset-latency, and wake after sleep onset. Overall WSA sleep-wake classification accuracy was 83%, sensitivity 95%, and specificity 37%. The WSA significantly overestimated total sleep time (48[81]minutes), Sleep efficiency (9[15]%), sleep onset latency (6[26]minutes), and underestimated wake after sleep onset (54[78]minutes). Accuracy and specificity were higher for night versus daytime sleep opportunities in healthy individuals (89% and 47% versus 82% and 26% respectively, p<0.05). Accuracy and sensitivity were also higher for healthy individuals (89% and 97%) versus those with sleep disorders (81% and 91%, p<0.05). WSA performance is comparable to other consumer sleep trackers, with high sensitivity but poor specificity compared to polysomnography. WSA performance was reasonably stable, but more variable in daytime sleep opportunities and in people with a sleep disorder. Contactless, under-mattress sleep sensors show promise for accurate sleep monitoring, noting the tendency to over-estimate sleep particularly where wake time is high.
{"title":"Performance evaluation of an under-mattress sleep sensor versus polysomnography in >400 nights with healthy and unhealthy sleep","authors":"Jack Manners, Eva Kemps, Bastien Lechat, Peter Catcheside, Danny Eckert, Hannah Scott","doi":"10.1101/2024.09.09.24312921","DOIUrl":"https://doi.org/10.1101/2024.09.09.24312921","url":null,"abstract":"Consumer sleep trackers provide useful insight into sleep. However, large scale performance evaluation studies are needed to properly understand sleep tracker accuracy. This study evaluated performance of an under-mattress sensor to estimate sleep and wake versus polysomnography in a large sample, including individuals with and without sleep disorders and during day versus night sleep opportunities, across multiple in-laboratory studies.\u0000183 participants (51/49% male/female, mean[SD] age=45[18] years) attended the sleep laboratory for a research study including simultaneous polysomnography and under-mattress sensor (Withings Sleep Analyzer [WSA]) recordings. Epoch-by-epoch analyses determined accuracy, sensitivity, and specificity of the WSA versus polysomnography. Bland-Altman plots examined bias in sleep duration, efficiency, onset-latency, and wake after sleep onset.\u0000Overall WSA sleep-wake classification accuracy was 83%, sensitivity 95%, and specificity 37%. The WSA significantly overestimated total sleep time (48[81]minutes), Sleep efficiency (9[15]%), sleep onset latency (6[26]minutes), and underestimated wake after sleep onset (54[78]minutes). Accuracy and specificity were higher for night versus daytime sleep opportunities in healthy individuals (89% and 47% versus 82% and 26% respectively, p<0.05). Accuracy and sensitivity were also higher for healthy individuals (89% and 97%) versus those with sleep disorders (81% and 91%, p<0.05).\u0000WSA performance is comparable to other consumer sleep trackers, with high sensitivity but poor specificity compared to polysomnography. WSA performance was reasonably stable, but more variable in daytime sleep opportunities and in people with a sleep disorder. Contactless, under-mattress sleep sensors show promise for accurate sleep monitoring, noting the tendency to over-estimate sleep particularly where wake time is high.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1101/2024.09.09.24313272
Hau-Tieng Wu, Ruey-Hsing Chou, Shen-Chih Wang, Cheng-Hsi Chang, Yu-Ting Lin
Objective: Quantifying physiological dynamics from nonstationary time series for clinical decision-making is challenging, especially when comparing data across different subjects. We propose a solution and validate it using two real-world surgical databases, focusing on underutilized arterial blood pressure (ABP) signals. Method: We apply a manifold learning algorithm, Dynamic Diffusion Maps (DDMap), combined with the novel Universal Coordinate (UC) algorithm to quantify dynamics from nonstationary time series. The method is demonstrated using ABP signal and validated with liver transplant and cardiovascular surgery databases, both containing clinical outcomes. Sensitivity analyses were conducted to assess robustness and identify optimal parameters. Results: UC application is validated by significant correlations between the derived index and clinical outcomes. Sensitivity analyses confirm the algorithms stability and help optimize parameters. Conclusions: DDMap combined with UC enables dynamic quantification of ABP signals and comparison across subjects. This technique repurposes typically discarded ABP signals in the operating room, with potential applications to other nonstationary biomedical signals in both hospital and homecare settings.
{"title":"Universal coordinate on wave-shape manifold of cardiovascular waveform signal for dynamic quantification and cross-subject comparison","authors":"Hau-Tieng Wu, Ruey-Hsing Chou, Shen-Chih Wang, Cheng-Hsi Chang, Yu-Ting Lin","doi":"10.1101/2024.09.09.24313272","DOIUrl":"https://doi.org/10.1101/2024.09.09.24313272","url":null,"abstract":"Objective: Quantifying physiological dynamics from nonstationary time series for clinical decision-making is challenging, especially when comparing data across different subjects. We propose a solution and validate it using two real-world surgical databases, focusing on underutilized arterial blood pressure (ABP) signals. Method: We apply a manifold learning algorithm, Dynamic Diffusion Maps (DDMap), combined with the novel Universal Coordinate (UC) algorithm to quantify dynamics from nonstationary time series. The method is demonstrated using ABP signal and validated with liver transplant and cardiovascular surgery databases, both containing clinical outcomes. Sensitivity analyses were conducted to assess robustness and identify optimal parameters. Results: UC application is validated by significant correlations between the derived index and clinical outcomes. Sensitivity analyses confirm the algorithms stability and help optimize parameters. Conclusions: DDMap combined with UC enables dynamic quantification of ABP signals and comparison across subjects. This technique repurposes typically discarded ABP signals in the operating room, with potential applications to other nonstationary biomedical signals in both hospital and homecare settings.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1101/2024.09.09.24313059
Louis Rebaud, Nicolo Capobianco, Clementine Sarkozy, Anne-Segolene Cottereau, Laetitia Vercellino, Olivier Casasnovas, Catherine Thieblemont, Bruce Spottiswoode, Irene Buvat
Objectives: The Robust and Optimized Biomarker Identifier (ROBI) feature selection pipeline is introduced to improve the identification of informative biomarkers coding information not already captured by existing features. It aims to accurately maximize the number of discoveries while minimizing and estimating the number of false positives (FP) with an adjustable selection stringency. Methods: 500 synthetic datasets and retrospective data of 378 Diffuse Large B Cell Lymphoma (DLBCL) patients were used for validation. On the DLBCL data, two established radiomic biomarkers, TMTV and Dmax, were measured from the 18F-FDG PET/CT scans, and 10,000 random ones were generated. Selection was performed and verified on each dataset. The efficacy of ROBI has been compared to methods controlling for multiple testing and a Cox model with Elasticnet penalty. Results: On synthetic datasets, ROBI selected significantly more true positives (TP) than FP (p < 0.001), and for 99.3% of datasets, the number of FP was within the estimated 95% confidence interval. ROBI significantly increased the number of TP compared to usual feature selection methods (p < 0.001). On retrospective data, ROBI selected the two established biomarkers and one random biomarker and estimated 95% chance of selecting 0 or 1 FP and a probability of 0.0014 of selecting only FP. Bonferroni correction selected no feature, and Elasticnet selected 101 spurious features and discarded TMTV. Conclusion: ROBI selected relevant biomarkers while effectively controlling for FPs, outperforming conventional selection methods. This underscores its potential as a valuable asset for biomarker discovery.
{"title":"ROBI: a Robust and Optimized Biomarker Identifier to increase the likelihood of discovering relevant radiomic features.","authors":"Louis Rebaud, Nicolo Capobianco, Clementine Sarkozy, Anne-Segolene Cottereau, Laetitia Vercellino, Olivier Casasnovas, Catherine Thieblemont, Bruce Spottiswoode, Irene Buvat","doi":"10.1101/2024.09.09.24313059","DOIUrl":"https://doi.org/10.1101/2024.09.09.24313059","url":null,"abstract":"Objectives: The Robust and Optimized Biomarker Identifier (ROBI) feature selection pipeline is introduced to improve the identification of informative biomarkers coding information not already captured by existing features. It aims to accurately maximize the number of discoveries while minimizing and estimating the number of false positives (FP) with an adjustable selection stringency.\u0000Methods: 500 synthetic datasets and retrospective data of 378 Diffuse Large B Cell Lymphoma (DLBCL) patients were used for validation. On the DLBCL data, two established radiomic biomarkers, TMTV and Dmax, were measured from the 18F-FDG PET/CT scans, and 10,000 random ones were generated. Selection was performed and verified on each dataset. The efficacy of ROBI has been compared to methods controlling for multiple testing and a Cox model with Elasticnet penalty.\u0000Results: On synthetic datasets, ROBI selected significantly more true positives (TP) than FP (p < 0.001), and for 99.3% of datasets, the number of FP was within the estimated 95% confidence interval. ROBI significantly increased the number of TP compared to usual feature selection methods (p < 0.001). On retrospective data, ROBI selected the two established biomarkers and one random biomarker and estimated 95% chance of selecting 0 or 1 FP and a probability of 0.0014 of selecting only FP. Bonferroni correction selected no feature, and Elasticnet selected 101 spurious features and discarded TMTV.\u0000Conclusion: ROBI selected relevant biomarkers while effectively controlling for FPs, outperforming conventional selection methods. This underscores its potential as a valuable asset for biomarker discovery.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-08DOI: 10.1101/2024.09.07.24313245
Xiaomeng Wang, Zhimei Ren, Jiancheng Ye
Heart failure (HF) is a critical public health issue, particularly for critically ill patients in intensive care units (ICUs). Predicting survival outcome in critically ill patients is a difficult yet crucially important task for timely treatment. This study utilizes a novel approach, conformalized survival analysis (CSA), designed to construct lower bounds on the survival time in critically ill HF patients with high confidence. Utilizing data from the MIMIC-IV dataset, this work demonstrates that CSA outperforms traditional survival models, such as the Cox proportional hazards model and Accelerated Failure Time (AFT) model, particularly in providing reliable, interpretable, and individualized predictions. By applying CSA to a large, real-world dataset, the study highlights its potential to improve decision-making in critical care, offering a more nuanced and accurate tool for prognostication in a setting where precise predictions and guaranteed uncertainty quantification can significantly influence patient outcomes.
{"title":"Predicting survival time for critically ill patients with heart failure using conformalized survival analysis","authors":"Xiaomeng Wang, Zhimei Ren, Jiancheng Ye","doi":"10.1101/2024.09.07.24313245","DOIUrl":"https://doi.org/10.1101/2024.09.07.24313245","url":null,"abstract":"Heart failure (HF) is a critical public health issue, particularly for critically ill patients in intensive care units (ICUs). Predicting survival outcome in critically ill patients is a difficult yet crucially important task for timely treatment. This study utilizes a novel approach, conformalized survival analysis (CSA), designed to construct lower bounds on the survival time in critically ill HF patients with high confidence. Utilizing data from the MIMIC-IV dataset, this work demonstrates that CSA outperforms traditional survival models, such as the Cox proportional hazards model and Accelerated Failure Time (AFT) model, particularly in providing reliable, interpretable, and individualized predictions. By applying CSA to a large, real-world dataset, the study highlights its potential to improve decision-making in critical care, offering a more nuanced and accurate tool for prognostication in a setting where precise predictions and guaranteed uncertainty quantification can significantly influence patient outcomes.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-07DOI: 10.1101/2024.09.06.24313221
Ramy Barhouche, Samson Tse, Fiona Inglis, Debbie Chaves, Erin Allison, Tina Colaco, Melody E. Morton Ninomiya
The practice of putting research into action is known by various names, depending on disciplinary norms. Knowledge mobilization, translation, and transfer (collectively referred to as K*) are three common terminologies used in research literature. Knowledge-to-action opportunities and gaps in academic research often remain obscure to non-academic researchers in communities, policy and decision makers, and practitioners who could benefit from up-to-date information on health and wellbeing. Academic research training, funding, and performance metrics rarely prioritize or address non-academic community needs from research. We propose to conduct a scoping review on reported K* in community-driven research contexts, examining the governance, processes, methods, and benefits of K*, and mapping who, what, where, and when K* terminology is used. This protocol paper outlines our approach to gathering, screening, analyzing, and reporting on available published literature from four databases.
{"title":"Knowledge mobilization with and for equity-deserving communities invested in research: A scoping review protocol","authors":"Ramy Barhouche, Samson Tse, Fiona Inglis, Debbie Chaves, Erin Allison, Tina Colaco, Melody E. Morton Ninomiya","doi":"10.1101/2024.09.06.24313221","DOIUrl":"https://doi.org/10.1101/2024.09.06.24313221","url":null,"abstract":"The practice of putting research into action is known by various names, depending on disciplinary norms. Knowledge mobilization, translation, and transfer (collectively referred to as K*) are three common terminologies used in research literature. Knowledge-to-action opportunities and gaps in academic research often remain obscure to non-academic researchers in communities, policy and decision makers, and practitioners who could benefit from up-to-date information on health and wellbeing. Academic research training, funding, and performance metrics rarely prioritize or address non-academic community needs from research. We propose to conduct a scoping review on reported K* in community-driven research contexts, examining the governance, processes, methods, and benefits of K*, and mapping who, what, where, and when K* terminology is used. This protocol paper outlines our approach to gathering, screening, analyzing, and reporting on available published literature from four databases.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"410 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1101/2024.09.04.24312866
Mingyu Lu, Ian Covert, Nathan J. White, Su-In Lee
Determining which features drive the treatment effect for individual patients has long been a complex and critical question in clinical decision-making. Evidence from randomized controlled trials (RCTs) are the gold standard for guiding treatment decisions. However, individual patient differences often complicate the application of RCT findings, leading to imperfect treatment options. Traditional subgroup analyses fall short due to data dimensionality, type, and study design. To overcome these limitations, we propose CODE-XAI, a framework that interprets Conditional Average Treatment Effect (CATE) models using Explainable AI (XAI) to perform feature discovery. CODE-XAI provides feature attribution at the individual subject level, enhancing our understanding of treatment responses. We benchmark these XAI methods using semi-synthetic data and RCTs, demonstrating their effectiveness in uncovering feature contributions and enabling cross-cohort analysis, advancing precision medicine and scientific discovery.
{"title":"CODE - XAI: Construing and Deciphering Treatment Effects via Explainable AI using Real-world Data","authors":"Mingyu Lu, Ian Covert, Nathan J. White, Su-In Lee","doi":"10.1101/2024.09.04.24312866","DOIUrl":"https://doi.org/10.1101/2024.09.04.24312866","url":null,"abstract":"Determining which features drive the treatment effect for individual patients has long been a complex and critical question in clinical decision-making. Evidence from randomized controlled trials (RCTs) are the gold standard for guiding treatment decisions. However, individual patient differences often complicate the application of RCT findings, leading to imperfect treatment options. Traditional subgroup analyses fall short due to data dimensionality, type, and study design. To overcome these limitations, we propose CODE-XAI, a framework that interprets Conditional Average Treatment Effect (CATE) models using Explainable AI (XAI) to perform feature discovery. CODE-XAI provides feature attribution at the individual subject level, enhancing our understanding of treatment responses. We benchmark these XAI methods using semi-synthetic data and RCTs, demonstrating their effectiveness in uncovering feature contributions and enabling cross-cohort analysis, advancing precision medicine and scientific discovery.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1101/2024.09.05.24313156
Ann-Kathrin Schalkamp, Kathryn J Peall, Neil A Harrison, Valentina Escott-Price, Payam Barnaghi, Cynthia Sandor
Background Use of digital sensors to passively collect long-term offers a step change in our ability to screen for early signs of disease in the general population. Smartwatch data has been shown to identify Parkinson’s disease (PD) several years before the clinical diagnosis, however, has not been evaluated in comparison to biological and pathological markers such as dopaminergic imaging (DaTscan) or cerebrospinal fluid (CSF) alpha-synuclein seed amplification assay (SAA) in an at-risk cohort.
{"title":"Digital risk score sensitively identifies presence of α-synuclein aggregation or dopaminergic deficit","authors":"Ann-Kathrin Schalkamp, Kathryn J Peall, Neil A Harrison, Valentina Escott-Price, Payam Barnaghi, Cynthia Sandor","doi":"10.1101/2024.09.05.24313156","DOIUrl":"https://doi.org/10.1101/2024.09.05.24313156","url":null,"abstract":"<strong>Background</strong> Use of digital sensors to passively collect long-term offers a step change in our ability to screen for early signs of disease in the general population. Smartwatch data has been shown to identify Parkinson’s disease (PD) several years before the clinical diagnosis, however, has not been evaluated in comparison to biological and pathological markers such as dopaminergic imaging (DaTscan) or cerebrospinal fluid (CSF) alpha-synuclein seed amplification assay (SAA) in an at-risk cohort.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}