Pub Date : 2024-07-31DOI: 10.1101/2024.07.29.24311177
Elise L Ruan, Abdulaziz Alkattan, Noemie Elhadad, Sarah Collins Rossetti
Successful integration of Generative Artificial Intelligence (AI) into healthcare requires understanding of health professionals perspectives, ideally through data-driven approaches. In this study, we use a semi-structured survey and mixed methods analyses to explore clinicians perceptions on the utility of generative AI for all types of clinical tasks, familiarity and competency with generative AI tools, and sentiments regarding the potential impact of generative AI on healthcare. Analysis of 116 clinician responses found differing perceptions regarding the usefulness of generative AI across clinical workflows, with information gathering from external sources rated highest and communication rated lowest. Clinician-generated prompt suggestions focused most often on clinician decision making and were of mixed quality, with participants more familiar with generative AI suggesting more high-quality prompts. Sentiments regarding the impact of generative AI varied, particularly regarding trustworthiness and impact on bias. Thematic analysis of open-ended comments highlighted concerns about patient care and the role of clinicians.
{"title":"Clinician Perceptions of Generative Artificial Intelligence Tools and Clinical Workflows: Potential Uses, Motivations for Adoption, and Sentiments on Impact","authors":"Elise L Ruan, Abdulaziz Alkattan, Noemie Elhadad, Sarah Collins Rossetti","doi":"10.1101/2024.07.29.24311177","DOIUrl":"https://doi.org/10.1101/2024.07.29.24311177","url":null,"abstract":"Successful integration of Generative Artificial Intelligence (AI) into healthcare requires understanding of health\u0000professionals perspectives, ideally through data-driven approaches. In this study, we use a semi-structured survey\u0000and mixed methods analyses to explore clinicians perceptions on the utility of generative AI for all types of clinical\u0000tasks, familiarity and competency with generative AI tools, and sentiments regarding the potential impact of\u0000generative AI on healthcare. Analysis of 116 clinician responses found differing perceptions regarding the usefulness\u0000of generative AI across clinical workflows, with information gathering from external sources rated highest and\u0000communication rated lowest. Clinician-generated prompt suggestions focused most often on clinician decision making\u0000and were of mixed quality, with participants more familiar with generative AI suggesting more high-quality prompts.\u0000Sentiments regarding the impact of generative AI varied, particularly regarding trustworthiness and impact on bias.\u0000Thematic analysis of open-ended comments highlighted concerns about patient care and the role of clinicians.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-27DOI: 10.1101/2024.07.26.24311003
Dan Aizenberg, Ido Shalev, Florina Uzefovsky, Alal Eran
Importance: Despite tremendous improvement in early identification of autism, ~25% of children receive their diagnosis after the age of six. Since evidence-based practices are more effective when started early, delayed diagnosis prevents many children from receiving optimal support. Objective: To identify and comparatively characterize groups of individuals diagnosed with Autism Spectrum Disorder (ASD) after the age of six. Design: This cross-sectional study used various machine learning approaches to classify, characterize, and compare individuals from the Simons Foundation Powering Autism Research for Knowledge (SPARK) cohort, recruited between 2015-2020. Setting: Analyses of medical histories and behavioral instruments. Participants: 23,632 SPARK participants. Exposure: ASD diagnosis upon registration to SPARK. Main Outcomes and Measures: Clusters of individuals diagnosed after the age of six (delayed ASD diagnosis) and their defining characteristics, as compared to individuals diagnosed before the age of six (timely ASD diagnosis). Odds and mean ratios were used for feature comparisons. Shapley values were used to assess the predictive value of these features, and correlation-based cliques were used to understand their interconnectedness. Results: Two robust subgroups of individuals with delayed ASD diagnosis were detected. The first, D1, included 3,612 individuals with lower support needs as compared to 17,992 individuals with a timely diagnosis. The second subgroup, D2, included 2,028 individuals with higher support needs, as consistently reflected by all commonly-used behavioral instruments, the greatest being repetitive and restrictive behaviors measured by the Repetitive Behavior Scale - Revised (RBS-R; D1: MR = 0.6854, 95% CI = [0.6848, 0.686]; D2: MR = 1.4223, 95% CI = [1.4210,1.4238], P = 3.54x10^-134). Moreover, individuals belonging to D1 had fewer comorbidities as compared to individuals with a timely ASD diagnosis, while D2 individuals had more (D1: mean = 3.47, t = 15.21; D2: mean = 8.12, t = 48.26, p < 2.23x10^-308). A Random Forest classifier trained on the groups' characteristics achieved an AUC of 0.94. Further connectivity analysis of the groups' most informative characteristics demonstrated their distinct topological differences. Conclusions and Relevance: This analysis identified two opposite groups of individuals with delayed ASD diagnosis, thereby providing valuable insights for the development of targeted diagnostic strategies.
{"title":"Data-driven characterization of individuals with delayed autism diagnosis","authors":"Dan Aizenberg, Ido Shalev, Florina Uzefovsky, Alal Eran","doi":"10.1101/2024.07.26.24311003","DOIUrl":"https://doi.org/10.1101/2024.07.26.24311003","url":null,"abstract":"<strong>Importance:</strong> Despite tremendous improvement in early identification of autism, ~25% of children receive their diagnosis after the age of six. Since evidence-based practices are more effective when started early, delayed diagnosis prevents many children from receiving optimal support. <strong>Objective:</strong> To identify and comparatively characterize groups of individuals diagnosed with Autism Spectrum Disorder (ASD) after the age of six.\u0000<strong>Design:</strong> This cross-sectional study used various machine learning approaches to classify, characterize, and compare individuals from the Simons Foundation Powering Autism Research for Knowledge (SPARK) cohort, recruited between 2015-2020.\u0000<strong>Setting:</strong> Analyses of medical histories and behavioral instruments. Participants: 23,632 SPARK participants. <strong>Exposure:</strong> ASD diagnosis upon registration to SPARK.\u0000<strong>Main Outcomes and Measures:</strong> Clusters of individuals diagnosed after the age of six (delayed ASD diagnosis) and their defining characteristics, as compared to individuals diagnosed before the age of six (timely ASD diagnosis). Odds and mean ratios were used for feature comparisons. Shapley values were used to assess the predictive value of these features, and correlation-based cliques were used to understand their interconnectedness. <strong>Results:</strong> Two robust subgroups of individuals with delayed ASD diagnosis were detected. The first, D1, included 3,612 individuals with lower support needs as compared to 17,992 individuals with a timely diagnosis. The second subgroup, D2, included 2,028 individuals with higher support needs, as consistently reflected by all commonly-used behavioral instruments, the greatest being repetitive and restrictive behaviors measured by the Repetitive Behavior Scale - Revised (RBS-R; D1: MR = 0.6854, 95% CI = [0.6848, 0.686]; D2: MR = 1.4223, 95% CI = [1.4210,1.4238], P = 3.54x10^-134). Moreover, individuals belonging to D1 had fewer comorbidities as compared to individuals with a timely ASD diagnosis, while D2 individuals had more (D1: mean = 3.47, t = 15.21; D2: mean = 8.12, t = 48.26, p < 2.23x10^-308). A Random Forest classifier trained on the groups' characteristics achieved an AUC of 0.94. Further connectivity analysis of the groups' most informative characteristics demonstrated their distinct topological differences.\u0000<strong>Conclusions and Relevance:</strong> This analysis identified two opposite groups of individuals with delayed ASD diagnosis, thereby providing valuable insights for the development of targeted diagnostic strategies.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1101/2024.07.25.24311018
Edmond Li, Olivia Lounsbury, Mujtaba Hasnain, Hutan Ashrafian, Ara Darzi, Ana Luisa Neves, Jonathan Clarke
Abstract Background: The lack of interoperability has been a well-recognised limitation associated with the use of electronic health records (EHR). However, less is known about how it manifests for frontline NHS staff when delivering care, how it impacts patient care, and what are its implications on care efficiency. Objectives: (1) To capture the perceptions of physicians regarding the current state of EHRs interoperability, (2) to investigate how poor interoperability affects patient care and safety and (3) to determine the effects on care efficiency in the NHS. Methods: An online survey was conducted to explore how physicians perceived the routine use of EHRs, its effects on patient safety, and impact to care efficiency in NHS healthcare facilities. Descriptive statistics was used to report any notable findings observed. Results: A total of 636 NHS physicians participated. Participants reported that EHR interoperability is rudimentary across much of the NHS, with limited ability to read but not edit data from within their organisation. Negative perceptions were most pronounced amongst specialties in secondary care settings and those with less than one year of EHR experience or lower self-reported EHR skills. Limited interoperability prolonged hospital stays, lengthened consultation times, and frequently necessitated repeat investigations to be performed. Limited EHR interoperability impaired physician access to clinical data, hampered communication between providers, and was perceived to threatened patient safety. Conclusion: As healthcare data continues to increase in complexity and volume, EHR interoperability must evolve to accommodate these growing changes and ensure the continued delivery of safe care. The experiences of physicians provide valuable insight into the practical challenges limited interoperability poses and can contribute to future policy solutions to better integrate EHRs in the clinical environment. Public Interest/Lay Summary Limited interoperability between EHR systems has been a longstanding problem since the technology's introduction in NHS England. However, little research has been done to understand the extent of this problem from the perspective of physicians and the challenges it poses. This study surveyed 636 physicians across England to better understand limited EHR interoperability. Most participants reported that interoperability between NHS facilities was inadequate. Consequences of this included increased duration of hospital stays, lengthened consultation times, and more redundant diagnostic tests performed. Limited interoperability hindered communication between NHS workers and threatened care quality and patient safety. As more healthcare technologies are incorporated into the NHS, gaining greater insight from physicians is critical to finding solutions to address these problems.
{"title":"Physician experiences of electronic health records interoperability and its practical impact on care delivery in the English NHS: A cross-sectional survey study","authors":"Edmond Li, Olivia Lounsbury, Mujtaba Hasnain, Hutan Ashrafian, Ara Darzi, Ana Luisa Neves, Jonathan Clarke","doi":"10.1101/2024.07.25.24311018","DOIUrl":"https://doi.org/10.1101/2024.07.25.24311018","url":null,"abstract":"Abstract Background: The lack of interoperability has been a well-recognised limitation associated with the use of electronic health records (EHR). However, less is known about how it manifests for frontline NHS staff when delivering care, how it impacts patient care, and what are its implications on care efficiency. Objectives: (1) To capture the perceptions of physicians regarding the current state of EHRs interoperability, (2) to investigate how poor interoperability affects patient care and safety and (3) to determine the effects on care efficiency in the NHS. Methods: An online survey was conducted to explore how physicians perceived the routine use of EHRs, its effects on patient safety, and impact to care efficiency in NHS healthcare facilities. Descriptive statistics was used to report any notable findings observed. Results: A total of 636 NHS physicians participated. Participants reported that EHR interoperability is rudimentary across much of the NHS, with limited ability to read but not edit data from within their organisation. Negative perceptions were most pronounced amongst specialties in secondary care settings and those with less than one year of EHR experience or lower self-reported EHR skills. Limited interoperability prolonged hospital stays, lengthened consultation times, and frequently necessitated repeat investigations to be performed. Limited EHR interoperability impaired physician access to clinical data, hampered communication between providers, and was perceived to threatened patient safety. Conclusion: As healthcare data continues to increase in complexity and volume, EHR interoperability must evolve to accommodate these growing changes and ensure the continued delivery of safe care. The experiences of physicians provide valuable insight into the practical challenges limited interoperability poses and can contribute to future policy solutions to better integrate EHRs in the clinical environment. Public Interest/Lay Summary Limited interoperability between EHR systems has been a longstanding problem since the technology's introduction in NHS England. However, little research has been done to understand the extent of this problem from the perspective of physicians and the challenges it poses. This study surveyed 636 physicians across England to better understand limited EHR interoperability. Most participants reported that interoperability between NHS facilities was inadequate. Consequences of this included increased duration of hospital stays, lengthened consultation times, and more redundant diagnostic tests performed. Limited interoperability hindered communication between NHS workers and threatened care quality and patient safety. As more healthcare technologies are incorporated into the NHS, gaining greater insight from physicians is critical to finding solutions to address these problems.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1101/2024.07.25.24308199
Arindam Chatterjee, Rimu Chaudhuri, Arijit Dutta
The Pittsburgh Sleep Quality Index (PSQI) has gained widespread acceptance as a useful tool to measure sleep quality. In order to formulate the diagnosis process, it is essential that we understand the factor structure inherent in the PSQI data. In this work, we seek to estimate such a structure with a focus on the Indian Information Technology (IT) workers. We have used Confirmatory Factor Analysis (CFA) and the Exploratory Factor Analysis (EFA) for this purpose. We have also used the Multi layer perceptron based method to see how we can classify the sleep quality of the sampled population. We have discovered that, contrary to the general perception, most Indian IT employees have sleep quality belonging to good and very good classes.
匹兹堡睡眠质量指数(PSQI)作为测量睡眠质量的有效工具已被广泛接受。为了制定诊断程序,我们必须了解 PSQI 数据的内在因素结构。在这项工作中,我们试图以印度信息技术(IT)工作者为重点,对这种结构进行估计。为此,我们使用了确认性因子分析(CFA)和探索性因子分析(EFA)。我们还使用了基于多层感知器的方法来了解如何对抽样人群的睡眠质量进行分类。我们发现,与一般看法相反,大多数印度 IT 员工的睡眠质量属于良好和非常好的级别。
{"title":"Analyzing the Factor Structure and Sleep Quality of Pittsburgh Sleep Quality Index in Indian Information Technology Sector","authors":"Arindam Chatterjee, Rimu Chaudhuri, Arijit Dutta","doi":"10.1101/2024.07.25.24308199","DOIUrl":"https://doi.org/10.1101/2024.07.25.24308199","url":null,"abstract":"The Pittsburgh Sleep Quality Index (PSQI) has gained widespread acceptance as a useful tool to measure sleep quality. In order to formulate the diagnosis process, it is essential that we understand the factor structure inherent in the PSQI data. In this work, we seek to estimate such a structure with a focus on the Indian Information Technology (IT) workers. We have used Confirmatory Factor Analysis (CFA) and the Exploratory Factor Analysis (EFA) for this purpose. We have also used the Multi layer perceptron based method to see how we can classify the sleep quality of the sampled population. We have discovered that, contrary to the general perception, most Indian IT employees have sleep quality belonging to good and very good classes.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"125 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1101/2024.07.23.24310847
Marina Alvarez-Estape, Ivan Cano, Rosa Pino, Carla González Grado, Andrea Aldemira-Liz, Javier Gonzálvez-Ortuño, Juanjo do Olmo, Javier Logroño, Marcelo Martínez, Carlos Mascías, Julián Isla, Jordi Martínez Roldán, Cristian Launes, Francesc Garcia-Cuyas, Paula Esteller-Cucala
Importance The time to accurately diagnose rare pediatric diseases often spans years. Assessing the diagnostic accuracy of an LLM-based tool on real pediatric cases can help reduce this time, providing quicker diagnoses for patients and their families. Objective To evaluate the clinical utility of DxGPT as a support tool for differential diagnosis of both common and rare diseases. Design Unicentric descriptive cross-sectional exploratory study. Anonymized data from 50 pediatric patients' medical histories, covering common and rare pathologies, were used to generate clinical case notes. Each clinical case included essential data, with some expanded by complementary tests. Setting This study was conducted at a reference pediatric hospital, Sant Joan de Déu Barcelona Children′s Hospital. Participants A total of 50 clinical cases were diagnosed by 78 volunteer doctors (medical diagnostic team) with varying experience, each reviewing 3 clinical cases. Interventions Each clinician listed up to five diagnoses per clinical case note. The same was done on the DxGPT web platform, obtaining the Top-5 diagnostic proposals. To evaluate DxGPT's variability, each note was queried three times. Main Outcome(s) and Measure(s) The study mainly focused on comparing diagnostic accuracy, defined as the percentage of cases with the correct diagnosis, between the medical diagnostic team and DxGPT. Other evaluation criteria included qualitative assessments. The medical diagnostic team also completed a survey on their user experience with DxGPT. Results Top-5 diagnostic accuracy was 65% for clinicians and 60% for DxGPT, with no significant differences. Accuracies for common diseases were higher (Clinicians: 79%, DxGPT: 71%) than for rare diseases (Clinicians: 50%, DxGPT: 49%). Accuracy increased similarly in both groups with expanded information, but this increase was only stastically significant in clinicians (simple 52% vs. expanded 69%; p=0.03). DxGPT′s response variability affected less than 5% of clinical case notes. A survey of 48 clinicians rated the DxGPT platform 3.9/5 overall, 4.1/5 for usefulness, and 4.5/5 for usability. Conclusions and Relevance DxGPT showed diagnostic accuracies similar to medical staff from a pediatric hospital, indicating its potential for supporting differential diagnosis in other settings. Clinicians praised its usability and simplicity. These tools could provide new insights for challenging diagnostic cases.
{"title":"Evaluation of the Clinical Utility of DxGPT, a GPT-4 Based Large Language Model, through an Analysis of Diagnostic Accuracy and User Experience","authors":"Marina Alvarez-Estape, Ivan Cano, Rosa Pino, Carla González Grado, Andrea Aldemira-Liz, Javier Gonzálvez-Ortuño, Juanjo do Olmo, Javier Logroño, Marcelo Martínez, Carlos Mascías, Julián Isla, Jordi Martínez Roldán, Cristian Launes, Francesc Garcia-Cuyas, Paula Esteller-Cucala","doi":"10.1101/2024.07.23.24310847","DOIUrl":"https://doi.org/10.1101/2024.07.23.24310847","url":null,"abstract":"<strong>Importance</strong> The time to accurately diagnose rare pediatric diseases often spans years. Assessing the diagnostic accuracy of an LLM-based tool on real pediatric cases can help reduce this time, providing quicker diagnoses for patients and their families. <strong>Objective</strong> To evaluate the clinical utility of DxGPT as a support tool for differential diagnosis of both common and rare diseases. <strong>Design</strong> Unicentric descriptive cross-sectional exploratory study. Anonymized data from 50 pediatric patients' medical histories, covering common and rare pathologies, were used to generate clinical case notes. Each clinical case included essential data, with some expanded by complementary tests. <strong>Setting</strong> This study was conducted at a reference pediatric hospital, Sant Joan de Déu Barcelona Children′s Hospital. <strong>Participants</strong> A total of 50 clinical cases were diagnosed by 78 volunteer doctors (medical diagnostic team) with varying experience, each reviewing 3 clinical cases. <strong>Interventions</strong> Each clinician listed up to five diagnoses per clinical case note. The same was done on the DxGPT web platform, obtaining the Top-5 diagnostic proposals. To evaluate DxGPT's variability, each note was queried three times. <strong>Main Outcome(s) and Measure(s)</strong> The study mainly focused on comparing diagnostic accuracy, defined as the percentage of cases with the correct diagnosis, between the medical diagnostic team and DxGPT. Other evaluation criteria included qualitative assessments. The medical diagnostic team also completed a survey on their user experience with DxGPT.\u0000<strong>Results</strong> Top-5 diagnostic accuracy was 65% for clinicians and 60% for DxGPT, with no significant differences. Accuracies for common diseases were higher (Clinicians: 79%, DxGPT: 71%) than for rare diseases (Clinicians: 50%, DxGPT: 49%). Accuracy increased similarly in both groups with expanded information, but this increase was only stastically significant in clinicians (simple 52% vs. expanded 69%; p=0.03). DxGPT′s response variability affected less than 5% of clinical case notes. A survey of 48 clinicians rated the DxGPT platform 3.9/5 overall, 4.1/5 for usefulness, and 4.5/5 for usability. <strong>Conclusions and Relevance</strong> DxGPT showed diagnostic accuracies similar to medical staff from a pediatric hospital, indicating its potential for supporting differential diagnosis in other settings. Clinicians praised its usability and simplicity. These tools could provide new insights for challenging diagnostic cases.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"93 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1101/2024.07.25.24310650
Yi-Jia Huang, Chun-houh Chen, Hsin-Chou Yang
The rising prevalence of Type 2 Diabetes (T2D) presents a critical global health challenge. Effective risk assessment and prevention strategies not only improve patient quality of life but also alleviate national healthcare expenditures. The integration of medical imaging and genetic data from extensive biobanks, driven by artificial intelligence (AI), is revolutionizing precision and smart health initiatives. In this study, we applied these principles to T2D by analyzing medical images (abdominal ultrasonography and bone density scans) alongside whole-genome single nucleotide variations in 17,785 Han Chinese participants from the Taiwan Biobank. Rigorous data cleaning and preprocessing procedures were applied. Imaging analysis utilized densely connected convolutional neural networks, augmented by graph neural networks to account for intra-individual image dependencies, while genetic analysis employed Bayesian statistical learning to derive polygenic risk scores (PRS). These modalities were integrated through eXtreme Gradient Boosting (XGBoost), yielding several key findings. First, pixel-based image analysis outperformed feature-centric image analysis in accuracy, automation, and cost efficiency. Second, multi-modality analysis significantly enhanced predictive accuracy compared to single-modality approaches. Third, this comprehensive approach, combining medical imaging, genetic, and demographic data, represents a promising frontier for fusion modeling, integrating AI and statistical learning techniques in disease risk assessment. Our model achieved an Area under the Receiver Operating Characteristic Curve (AUC) of 0.944, with an accuracy of 0.875, sensitivity of 0.882, specificity of 0.875, and a Youden index of 0.754. Additionally, the analysis revealed significant positive correlations between the multi-image risk score (MRS) and T2D, as well as between the PRS and T2D, identifying high-risk subgroups within the cohort. This study pioneers the integration of multimodal imaging pixels and genome-wide genetic variation data for precise T2D risk assessment, advancing the understanding of precision and smart health.
{"title":"AI-driven Integration of Multimodal Imaging Pixel Data and Genome-wide Genotype Data Enhances Precision Health for Type 2 Diabetes: Insights from a Large-scale Biobank Study","authors":"Yi-Jia Huang, Chun-houh Chen, Hsin-Chou Yang","doi":"10.1101/2024.07.25.24310650","DOIUrl":"https://doi.org/10.1101/2024.07.25.24310650","url":null,"abstract":"The rising prevalence of Type 2 Diabetes (T2D) presents a critical global health challenge. Effective risk assessment and prevention strategies not only improve patient quality of life but also alleviate national healthcare expenditures. The integration of medical imaging and genetic data from extensive biobanks, driven by artificial intelligence (AI), is revolutionizing precision and smart health initiatives.\u0000In this study, we applied these principles to T2D by analyzing medical images (abdominal ultrasonography and bone density scans) alongside whole-genome single nucleotide variations in 17,785 Han Chinese participants from the Taiwan Biobank. Rigorous data cleaning and preprocessing procedures were applied. Imaging analysis utilized densely connected convolutional neural networks, augmented by graph neural networks to account for intra-individual image dependencies, while genetic analysis employed Bayesian statistical learning to derive polygenic risk scores (PRS). These modalities were integrated through eXtreme Gradient Boosting (XGBoost), yielding several key findings.\u0000First, pixel-based image analysis outperformed feature-centric image analysis in accuracy, automation, and cost efficiency. Second, multi-modality analysis significantly enhanced predictive accuracy compared to single-modality approaches. Third, this comprehensive approach, combining medical imaging, genetic, and demographic data, represents a promising frontier for fusion modeling, integrating AI and statistical learning techniques in disease risk assessment. Our model achieved an Area under the Receiver Operating Characteristic Curve (AUC) of 0.944, with an accuracy of 0.875, sensitivity of 0.882, specificity of 0.875, and a Youden index of 0.754. Additionally, the analysis revealed significant positive correlations between the multi-image risk score (MRS) and T2D, as well as between the PRS and T2D, identifying high-risk subgroups within the cohort.\u0000This study pioneers the integration of multimodal imaging pixels and genome-wide genetic variation data for precise T2D risk assessment, advancing the understanding of precision and smart health.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1101/2024.07.24.24310930
Jack Gallifant, Majid Afshar, Saleem Ameen, Yindalon Aphinyanaphongs, Shan Chen, Giovanni Cacciamani, Dina Demner-Fushman, Dmitriy Dligach, Roxana Daneshjou, Chrystinne Fernandes, Lasse Hyldig Hansen, Adam Landman, Liam G. McCoy, Timothy Miller, Amy Moreno, Nikolaj Munch, David Restrepo, Guergana Savova, Renato Umeton, Judy Wawira Gichoya, Gary S. Collins, Karel G. M. Moons, Leo A. Celi, Danielle S. Bitterman
Large Language Models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present TRIPOD-LLM, an extension of the TRIPOD+AI statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight, and task-specific performance reporting. We also introduce an interactive website (https://tripod-llm.vercel.app/) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility, and clinical applicability of LLM research in healthcare through comprehensive reporting.
{"title":"The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use","authors":"Jack Gallifant, Majid Afshar, Saleem Ameen, Yindalon Aphinyanaphongs, Shan Chen, Giovanni Cacciamani, Dina Demner-Fushman, Dmitriy Dligach, Roxana Daneshjou, Chrystinne Fernandes, Lasse Hyldig Hansen, Adam Landman, Liam G. McCoy, Timothy Miller, Amy Moreno, Nikolaj Munch, David Restrepo, Guergana Savova, Renato Umeton, Judy Wawira Gichoya, Gary S. Collins, Karel G. M. Moons, Leo A. Celi, Danielle S. Bitterman","doi":"10.1101/2024.07.24.24310930","DOIUrl":"https://doi.org/10.1101/2024.07.24.24310930","url":null,"abstract":"Large Language Models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present TRIPOD-LLM, an extension of the TRIPOD+AI statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight, and task-specific performance reporting. We also introduce an interactive website (https://tripod-llm.vercel.app/) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility, and clinical applicability of LLM research in healthcare through comprehensive reporting.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.1101/2024.07.24.24310941
Devin R Setiawan, Yumiko Wiranto, Jeffrey M Girard, Amber Watts, Arian Ashourvan
Background: Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning framework that addresses the individualized feature addition problem and enhances diagnostic accuracy for clinical assessments. Methods: Individualized Clinical Assessment Recommendation System (iCARE) employs locally weighted logistic regression and Shapley Additive Explanations (SHAP) value analysis to tailor feature selection to individual patient characteristics. Evaluations were conducted on synthetic and real-world datasets, including early-stage diabetes risk prediction and heart failure clinical records from the UCI Machine Learning Repository. We compared the performance of iCARE with a Global approach using statistical analysis on accuracy and area under the ROC curve (AUC) to select the best additional features. Findings: The iCARE framework enhances predictive accuracy and AUC metrics when additional features exhibit distinct predictive capabilities, as evidenced by synthetic datasets 1-3 and the early diabetes dataset. Specifically, in synthetic dataset 1, iCARE achieved an accuracy of 0.999 and an AUC of 1.000, outperforming the Global approach with an accuracy of 0.689 and an AUC of 0.639. In the early diabetes dataset, iCARE shows improvements of 1.5-3.5% in accuracy and AUC across different numbers of initial features. Conversely, in synthetic datasets 4-5 and the heart failure dataset, where features lack discernible predictive distinctions, iCARE shows no significant advantage over global approaches on accuracy and AUC metrics. Interpretation: iCARE provides personalized feature recommendations that enhance diagnostic accuracy in scenarios where individualized approaches are critical, improving the precision and effectiveness of medical diagnoses. Funding: This work was supported by startup funding from the Department of Psychology at the University of Kansas provided to A.A., and the R01MH125740 award from NIH partially supported J.M.G.'s work.
{"title":"Individualized Machine-learning-based Clinical Assessment Recommendation System","authors":"Devin R Setiawan, Yumiko Wiranto, Jeffrey M Girard, Amber Watts, Arian Ashourvan","doi":"10.1101/2024.07.24.24310941","DOIUrl":"https://doi.org/10.1101/2024.07.24.24310941","url":null,"abstract":"Background: Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning framework that addresses the individualized feature addition problem and enhances diagnostic accuracy for clinical assessments.\u0000Methods: Individualized Clinical Assessment Recommendation System (iCARE) employs locally weighted logistic regression and Shapley Additive Explanations (SHAP) value analysis to tailor feature selection to individual patient characteristics. Evaluations were conducted on synthetic and real-world datasets, including early-stage diabetes risk prediction and heart failure clinical records from the UCI Machine Learning Repository. We compared the performance of iCARE with a Global approach using statistical analysis on accuracy and area under the ROC curve (AUC) to select the best additional features.\u0000Findings: The iCARE framework enhances predictive accuracy and AUC metrics when additional features exhibit distinct predictive capabilities, as evidenced by synthetic datasets 1-3 and the early diabetes dataset. Specifically, in synthetic dataset 1, iCARE achieved an accuracy of 0.999 and an AUC of 1.000, outperforming the Global approach with an accuracy of 0.689 and an AUC of 0.639. In the early diabetes dataset, iCARE shows improvements of 1.5-3.5% in accuracy and AUC across different numbers of initial features. Conversely, in synthetic datasets 4-5 and the heart failure dataset, where features lack discernible predictive distinctions, iCARE shows no significant advantage over global approaches on accuracy and AUC metrics.\u0000Interpretation: iCARE provides personalized feature recommendations that enhance diagnostic accuracy in scenarios where individualized approaches are critical, improving the precision and effectiveness of medical diagnoses.\u0000Funding: This work was supported by startup funding from the Department of Psychology at the University of Kansas provided to A.A., and the R01MH125740 award from NIH partially supported J.M.G.'s work.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"101 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1101/2024.07.22.24310824
Dhavalkumar Patel, Prem Timsina, Ganesh Raut, Robert Freeman, Matthew Levin, Girish Nadkarni, Benjamin S Glicksberg, Eyal Klang
Large Language Models (LLMs) are becoming integral to healthcare analytics. However, the influence of the temperature hyperparameter, which controls output randomness, remains poorly understood in clinical tasks. This study evaluates the effects of different temperature settings across various clinical tasks. We conducted a retrospective cohort study using electronic health records from the Mount Sinai Health System, collecting a random sample of 1283 patients from January to December 2023. Three LLMs (GPT-4, GPT-3.5, and Llama-3-70b) were tested at five temperature settings (0.2, 0.4, 0.6, 0.8, 1.0) for their ability to predict in-hospital mortality (binary classification), length of stay (regression), and the accuracy of medical coding (clinical reasoning). For mortality prediction, all models' accuracies were generally stable across different temperatures. Llama-3 showed the highest accuracy, around 90%, followed by GPT-4 (80-83%) and GPT-3.5 (74-76%). Regression analysis for predicting the length of stay showed that all models performed consistently across different temperatures. In the medical coding task, performance was also stable across temperatures, with GPT-4 achieving the highest accuracy at 17% for complete code accuracy. Our study demonstrates that LLMs maintain consistent accuracy across different temperature settings for varied clinical tasks, challenging the assumption that lower temperatures are necessary for clinical reasoning.
{"title":"Exploring Temperature Effects on Large Language Models Across Various Clinical Tasks","authors":"Dhavalkumar Patel, Prem Timsina, Ganesh Raut, Robert Freeman, Matthew Levin, Girish Nadkarni, Benjamin S Glicksberg, Eyal Klang","doi":"10.1101/2024.07.22.24310824","DOIUrl":"https://doi.org/10.1101/2024.07.22.24310824","url":null,"abstract":"Large Language Models (LLMs) are becoming integral to healthcare analytics. However, the influence of the temperature hyperparameter, which controls output randomness, remains poorly understood in clinical tasks. This study evaluates the effects of different temperature settings across various clinical tasks. We conducted a retrospective cohort study using electronic health records from the Mount Sinai Health System, collecting a random sample of 1283 patients from January to December 2023. Three LLMs (GPT-4, GPT-3.5, and Llama-3-70b) were tested at five temperature settings (0.2, 0.4, 0.6, 0.8, 1.0) for their ability to predict in-hospital mortality (binary classification), length of stay (regression), and the accuracy of medical coding (clinical reasoning). For mortality prediction, all models' accuracies were generally stable across different temperatures. Llama-3 showed the highest accuracy, around 90%, followed by GPT-4 (80-83%) and GPT-3.5 (74-76%). Regression analysis for predicting the length of stay showed that all models performed consistently across different temperatures. In the medical coding task, performance was also stable across temperatures, with GPT-4 achieving the highest accuracy at 17% for complete code accuracy. Our study demonstrates that LLMs maintain consistent accuracy across different temperature settings for varied clinical tasks, challenging the assumption that lower temperatures are necessary for clinical reasoning.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-21DOI: 10.1101/2024.07.19.24310732
Basil A. Darwish, Nancy M. Salem, Ghada Kareem, Lamees N. Mahmoud, Ibrahim Sadek
Stress can adversely impact health, leading to issues like high blood pressure, heart diseases, and a compromised immune system. Consequently, using wearable devices to monitor stress is essential for prompt intervention and effective management. This study investigates the efficacy of wearable devices in the early detection of psychological stress, employing both binary and five-class classification models. Significant correlations were observed between stress levels and physiological signals, including Electrocardiogram (ECG), Electrodermal Activity (EDA), and Respiration (RESP), establishing these modalities as reliable biomarkers for stress detection. Utilizing the publicly available Wearable Stress and Affect Detection (WESAD) dataset, we employed two ensemble methods, Majority Voting (MV) and Weighted Averaging (WA), to integrate these signals, achieving maximum accuracies of 99.96% for binary classification and 99.59% for five-class classification. This integration significantly enhances the accuracy and robustness of the stress detection system. Furthermore, ten different classifiers were evaluated, and hyperparameter optimization and K-fold cross-validation ranging from 3-fold to 10-fold were applied. Both time-domain and frequency-domain features were examined separately. A review of commercially available wearable devices supporting these modalities was also conducted, resulting in recommendations for optimal configurations for practical applications. Our findings highlight the potential of multimodal wearable devices in advancing the early detection and continuous monitoring of psychological stress, with significant implications for future research and the development of improved stress detection systems.
压力会对健康产生不利影响,导致高血压、心脏病和免疫系统受损等问题。因此,使用可穿戴设备监测压力对于及时干预和有效管理至关重要。本研究采用二元分类和五元分类模型,研究了可穿戴设备在早期检测心理压力方面的功效。研究观察到压力水平与心电图(ECG)、皮电活动(EDA)和呼吸(RESP)等生理信号之间存在显著的相关性,从而确定这些模式是检测压力的可靠生物标记。利用公开的可穿戴压力和情绪检测(WESAD)数据集,我们采用了两种集合方法--多数表决法(MV)和加权平均法(WA)来整合这些信号,二元分类的最高准确率达到 99.96%,五元分类的最高准确率达到 99.59%。这种整合大大提高了压力检测系统的准确性和鲁棒性。此外,还对十种不同的分类器进行了评估,并应用了超参数优化和 3 倍至 10 倍的 K 倍交叉验证。对时域和频域特征分别进行了研究。我们还对支持这些模式的市售可穿戴设备进行了审查,从而为实际应用提出了最佳配置建议。我们的研究结果凸显了多模态可穿戴设备在推进心理压力的早期检测和持续监测方面的潜力,对未来研究和开发更好的压力检测系统具有重要意义。
{"title":"Evaluating the Potential of Wearable Technology in Early Stress Detection: A Multimodal Approach","authors":"Basil A. Darwish, Nancy M. Salem, Ghada Kareem, Lamees N. Mahmoud, Ibrahim Sadek","doi":"10.1101/2024.07.19.24310732","DOIUrl":"https://doi.org/10.1101/2024.07.19.24310732","url":null,"abstract":"Stress can adversely impact health, leading to issues like high blood pressure, heart diseases, and a compromised immune system. Consequently, using wearable devices to monitor stress is essential for prompt intervention and effective management. This study investigates the efficacy of wearable devices in the early detection of psychological stress, employing both binary and five-class classification models. Significant correlations were observed between stress levels and physiological signals, including Electrocardiogram (ECG), Electrodermal Activity (EDA), and Respiration (RESP), establishing these modalities as reliable biomarkers for stress detection. Utilizing the publicly available Wearable Stress and Affect Detection (WESAD) dataset, we employed two ensemble methods, Majority Voting (MV) and Weighted Averaging (WA), to integrate these signals, achieving maximum accuracies of 99.96% for binary classification and 99.59% for five-class classification. This integration significantly enhances the accuracy and robustness of the stress detection system. Furthermore, ten different classifiers were evaluated, and hyperparameter optimization and K-fold cross-validation ranging from 3-fold to 10-fold were applied. Both time-domain and frequency-domain features were examined separately. A review of commercially available wearable devices supporting these modalities was also conducted, resulting in recommendations for optimal configurations for practical applications. Our findings highlight the potential of multimodal wearable devices in advancing the early detection and continuous monitoring of psychological stress, with significant implications for future research and the development of improved stress detection systems.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141744328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}