首页 > 最新文献

medRxiv - Health Informatics最新文献

英文 中文
Clinician Perceptions of Generative Artificial Intelligence Tools and Clinical Workflows: Potential Uses, Motivations for Adoption, and Sentiments on Impact 临床医生对生成式人工智能工具和临床工作流程的看法:潜在用途、采用动机和对影响的看法
Pub Date : 2024-07-31 DOI: 10.1101/2024.07.29.24311177
Elise L Ruan, Abdulaziz Alkattan, Noemie Elhadad, Sarah Collins Rossetti
Successful integration of Generative Artificial Intelligence (AI) into healthcare requires understanding of healthprofessionals perspectives, ideally through data-driven approaches. In this study, we use a semi-structured surveyand mixed methods analyses to explore clinicians perceptions on the utility of generative AI for all types of clinicaltasks, familiarity and competency with generative AI tools, and sentiments regarding the potential impact ofgenerative AI on healthcare. Analysis of 116 clinician responses found differing perceptions regarding the usefulnessof generative AI across clinical workflows, with information gathering from external sources rated highest andcommunication rated lowest. Clinician-generated prompt suggestions focused most often on clinician decision makingand were of mixed quality, with participants more familiar with generative AI suggesting more high-quality prompts.Sentiments regarding the impact of generative AI varied, particularly regarding trustworthiness and impact on bias.Thematic analysis of open-ended comments highlighted concerns about patient care and the role of clinicians.
要将生成式人工智能(AI)成功融入医疗保健领域,需要了解医疗专业人员的观点,最好是通过数据驱动的方法。在本研究中,我们采用半结构式调查和混合方法分析,探讨临床医生对生成式人工智能在各类临床任务中的实用性、对生成式人工智能工具的熟悉程度和能力,以及对生成式人工智能对医疗保健的潜在影响的看法。对 116 份临床医生回复的分析发现,他们对生成式人工智能在临床工作流程中的实用性有着不同的看法,其中从外部来源收集信息的评分最高,而交流的评分最低。对开放式评论的主题分析突出了对患者护理和临床医生角色的担忧。
{"title":"Clinician Perceptions of Generative Artificial Intelligence Tools and Clinical Workflows: Potential Uses, Motivations for Adoption, and Sentiments on Impact","authors":"Elise L Ruan, Abdulaziz Alkattan, Noemie Elhadad, Sarah Collins Rossetti","doi":"10.1101/2024.07.29.24311177","DOIUrl":"https://doi.org/10.1101/2024.07.29.24311177","url":null,"abstract":"Successful integration of Generative Artificial Intelligence (AI) into healthcare requires understanding of health\u0000professionals perspectives, ideally through data-driven approaches. In this study, we use a semi-structured survey\u0000and mixed methods analyses to explore clinicians perceptions on the utility of generative AI for all types of clinical\u0000tasks, familiarity and competency with generative AI tools, and sentiments regarding the potential impact of\u0000generative AI on healthcare. Analysis of 116 clinician responses found differing perceptions regarding the usefulness\u0000of generative AI across clinical workflows, with information gathering from external sources rated highest and\u0000communication rated lowest. Clinician-generated prompt suggestions focused most often on clinician decision making\u0000and were of mixed quality, with participants more familiar with generative AI suggesting more high-quality prompts.\u0000Sentiments regarding the impact of generative AI varied, particularly regarding trustworthiness and impact on bias.\u0000Thematic analysis of open-ended comments highlighted concerns about patient care and the role of clinicians.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-driven characterization of individuals with delayed autism diagnosis 自闭症诊断延迟者的数据驱动特征描述
Pub Date : 2024-07-27 DOI: 10.1101/2024.07.26.24311003
Dan Aizenberg, Ido Shalev, Florina Uzefovsky, Alal Eran
Importance: Despite tremendous improvement in early identification of autism, ~25% of children receive their diagnosis after the age of six. Since evidence-based practices are more effective when started early, delayed diagnosis prevents many children from receiving optimal support. Objective: To identify and comparatively characterize groups of individuals diagnosed with Autism Spectrum Disorder (ASD) after the age of six.Design: This cross-sectional study used various machine learning approaches to classify, characterize, and compare individuals from the Simons Foundation Powering Autism Research for Knowledge (SPARK) cohort, recruited between 2015-2020.Setting: Analyses of medical histories and behavioral instruments. Participants: 23,632 SPARK participants. Exposure: ASD diagnosis upon registration to SPARK.Main Outcomes and Measures: Clusters of individuals diagnosed after the age of six (delayed ASD diagnosis) and their defining characteristics, as compared to individuals diagnosed before the age of six (timely ASD diagnosis). Odds and mean ratios were used for feature comparisons. Shapley values were used to assess the predictive value of these features, and correlation-based cliques were used to understand their interconnectedness. Results: Two robust subgroups of individuals with delayed ASD diagnosis were detected. The first, D1, included 3,612 individuals with lower support needs as compared to 17,992 individuals with a timely diagnosis. The second subgroup, D2, included 2,028 individuals with higher support needs, as consistently reflected by all commonly-used behavioral instruments, the greatest being repetitive and restrictive behaviors measured by the Repetitive Behavior Scale - Revised (RBS-R; D1: MR = 0.6854, 95% CI = [0.6848, 0.686]; D2: MR = 1.4223, 95% CI = [1.4210,1.4238], P = 3.54x10^-134). Moreover, individuals belonging to D1 had fewer comorbidities as compared to individuals with a timely ASD diagnosis, while D2 individuals had more (D1: mean = 3.47, t = 15.21; D2: mean = 8.12, t = 48.26, p < 2.23x10^-308). A Random Forest classifier trained on the groups' characteristics achieved an AUC of 0.94. Further connectivity analysis of the groups' most informative characteristics demonstrated their distinct topological differences.Conclusions and Relevance: This analysis identified two opposite groups of individuals with delayed ASD diagnosis, thereby providing valuable insights for the development of targeted diagnostic strategies.
重要性:尽管在早期识别自闭症方面取得了巨大进步,但仍有约 25% 的儿童在六岁后才得到诊断。由于循证实践在早期开始时更为有效,延迟诊断使许多儿童无法获得最佳支持。目标:识别并比较六岁后被诊断为自闭症谱系障碍(ASD)的人群特征:这项横断面研究采用了多种机器学习方法,对西蒙斯基金会自闭症知识研究(SPARK)队列中的个体进行分类、特征描述和比较:分析病史和行为工具。参与者:23632 名 SPARK 参与者。暴露:主要结果和测量:与六岁前确诊(及时确诊)的个体相比,六岁后确诊(延迟确诊)的个体集群及其定义特征。在进行特征比较时使用了比值比和均值比。沙普利值用于评估这些特征的预测价值,基于相关性的群组用于了解它们之间的相互联系。研究结果发现了两个具有延迟 ASD 诊断的强大亚群。第一个亚群(D1)包括 3,612 名需要较少支持的个体,而及时诊断的个体则有 17,992 名。第二个亚组 D2 包括 2028 名需要更多支持的个体,所有常用的行为测量工具都一致反映了这一点,其中最大的需求是重复和限制性行为,由重复行为量表-修订版(RBS-R;D1.MR = 0.6854,D2.MR = 0.6854)测量:D1:MR = 0.6854,95% CI = [0.6848,0.686];D2:MR = 1.4223,95% CI = [1.4210,1.4238],P = 3.54x10^-134)。此外,与及时确诊的 ASD 患者相比,D1 患者的合并症较少,而 D2 患者的合并症较多(D1:平均值 = 3.47,t = 15.21;D2:平均值 = 8.12,t = 48.26,P <2.23x10^-308)。根据各组特征训练的随机森林分类器的 AUC 为 0.94。对两组信息量最大的特征进行的进一步连通性分析表明,它们之间存在明显的拓扑差异:这项分析确定了两组具有延迟 ASD 诊断的相反群体,从而为制定有针对性的诊断策略提供了有价值的见解。
{"title":"Data-driven characterization of individuals with delayed autism diagnosis","authors":"Dan Aizenberg, Ido Shalev, Florina Uzefovsky, Alal Eran","doi":"10.1101/2024.07.26.24311003","DOIUrl":"https://doi.org/10.1101/2024.07.26.24311003","url":null,"abstract":"<strong>Importance:</strong> Despite tremendous improvement in early identification of autism, ~25% of children receive their diagnosis after the age of six. Since evidence-based practices are more effective when started early, delayed diagnosis prevents many children from receiving optimal support. <strong>Objective:</strong> To identify and comparatively characterize groups of individuals diagnosed with Autism Spectrum Disorder (ASD) after the age of six.\u0000<strong>Design:</strong> This cross-sectional study used various machine learning approaches to classify, characterize, and compare individuals from the Simons Foundation Powering Autism Research for Knowledge (SPARK) cohort, recruited between 2015-2020.\u0000<strong>Setting:</strong> Analyses of medical histories and behavioral instruments. Participants: 23,632 SPARK participants. <strong>Exposure:</strong> ASD diagnosis upon registration to SPARK.\u0000<strong>Main Outcomes and Measures:</strong> Clusters of individuals diagnosed after the age of six (delayed ASD diagnosis) and their defining characteristics, as compared to individuals diagnosed before the age of six (timely ASD diagnosis). Odds and mean ratios were used for feature comparisons. Shapley values were used to assess the predictive value of these features, and correlation-based cliques were used to understand their interconnectedness. <strong>Results:</strong> Two robust subgroups of individuals with delayed ASD diagnosis were detected. The first, D1, included 3,612 individuals with lower support needs as compared to 17,992 individuals with a timely diagnosis. The second subgroup, D2, included 2,028 individuals with higher support needs, as consistently reflected by all commonly-used behavioral instruments, the greatest being repetitive and restrictive behaviors measured by the Repetitive Behavior Scale - Revised (RBS-R; D1: MR = 0.6854, 95% CI = [0.6848, 0.686]; D2: MR = 1.4223, 95% CI = [1.4210,1.4238], P = 3.54x10^-134). Moreover, individuals belonging to D1 had fewer comorbidities as compared to individuals with a timely ASD diagnosis, while D2 individuals had more (D1: mean = 3.47, t = 15.21; D2: mean = 8.12, t = 48.26, p &lt; 2.23x10^-308). A Random Forest classifier trained on the groups' characteristics achieved an AUC of 0.94. Further connectivity analysis of the groups' most informative characteristics demonstrated their distinct topological differences.\u0000<strong>Conclusions and Relevance:</strong> This analysis identified two opposite groups of individuals with delayed ASD diagnosis, thereby providing valuable insights for the development of targeted diagnostic strategies.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physician experiences of electronic health records interoperability and its practical impact on care delivery in the English NHS: A cross-sectional survey study 英国国家医疗服务体系中医生对电子健康记录互操作性的体验及其对医疗服务的实际影响:横断面调查研究
Pub Date : 2024-07-26 DOI: 10.1101/2024.07.25.24311018
Edmond Li, Olivia Lounsbury, Mujtaba Hasnain, Hutan Ashrafian, Ara Darzi, Ana Luisa Neves, Jonathan Clarke
Abstract Background: The lack of interoperability has been a well-recognised limitation associated with the use of electronic health records (EHR). However, less is known about how it manifests for frontline NHS staff when delivering care, how it impacts patient care, and what are its implications on care efficiency. Objectives: (1) To capture the perceptions of physicians regarding the current state of EHRs interoperability, (2) to investigate how poor interoperability affects patient care and safety and (3) to determine the effects on care efficiency in the NHS. Methods: An online survey was conducted to explore how physicians perceived the routine use of EHRs, its effects on patient safety, and impact to care efficiency in NHS healthcare facilities. Descriptive statistics was used to report any notable findings observed. Results: A total of 636 NHS physicians participated. Participants reported that EHR interoperability is rudimentary across much of the NHS, with limited ability to read but not edit data from within their organisation. Negative perceptions were most pronounced amongst specialties in secondary care settings and those with less than one year of EHR experience or lower self-reported EHR skills. Limited interoperability prolonged hospital stays, lengthened consultation times, and frequently necessitated repeat investigations to be performed. Limited EHR interoperability impaired physician access to clinical data, hampered communication between providers, and was perceived to threatened patient safety. Conclusion: As healthcare data continues to increase in complexity and volume, EHR interoperability must evolve to accommodate these growing changes and ensure the continued delivery of safe care. The experiences of physicians provide valuable insight into the practical challenges limited interoperability poses and can contribute to future policy solutions to better integrate EHRs in the clinical environment. Public Interest/Lay Summary Limited interoperability between EHR systems has been a longstanding problem since the technology's introduction in NHS England. However, little research has been done to understand the extent of this problem from the perspective of physicians and the challenges it poses. This study surveyed 636 physicians across England to better understand limited EHR interoperability. Most participants reported that interoperability between NHS facilities was inadequate. Consequences of this included increased duration of hospital stays, lengthened consultation times, and more redundant diagnostic tests performed. Limited interoperability hindered communication between NHS workers and threatened care quality and patient safety. As more healthcare technologies are incorporated into the NHS, gaining greater insight from physicians is critical to finding solutions to address these problems.
摘要 背景:缺乏互操作性一直是人们公认的与使用电子健康记录(EHR)相关的限制因素。然而,人们对国家医疗服务系统(NHS)一线工作人员在提供医疗服务时的表现、对病人护理的影响以及对护理效率的影响却知之甚少。目标:(1) 了解医生对电子病历互操作性现状的看法,(2) 调查互操作性差如何影响患者护理和安全,(3) 确定对国家医疗服务系统护理效率的影响。调查方法进行了一项在线调查,以探讨医生如何看待电子病历的常规使用、其对患者安全的影响以及对国家医疗服务体系医疗机构护理效率的影响。使用描述性统计来报告观察到的任何显著发现。结果:共有 636 名国家医疗服务体系的医生参加了调查。参与者报告说,在大部分国家医疗服务体系中,电子病历的互操作性都很初级,在其组织内部读取数据的能力有限,但不能编辑数据。负面看法在二级医疗机构的专科医生和电子病历使用经验不足一年或自称电子病历技能较低的医生中最为明显。有限的互操作性延长了住院时间,延长了会诊时间,并经常需要重复进行检查。有限的电子病历互操作性影响了医生对临床数据的访问,阻碍了医疗服务提供者之间的沟通,并被认为威胁到患者的安全。结论随着医疗数据的复杂性和数量不断增加,电子病历互操作性必须不断发展,以适应这些日益增长的变化,并确保持续提供安全的医疗服务。医生们的经验为了解有限互操作性所带来的实际挑战提供了宝贵的见解,并有助于制定未来的政策解决方案,以更好地将电子病历整合到临床环境中。公益/布局摘要 自从英格兰国家医疗服务系统(NHS)引入电子病历技术以来,电子病历系统之间有限的互操作性一直是个老大难问题。然而,很少有研究从医生的角度来了解这一问题的严重程度及其带来的挑战。本研究调查了英格兰的 636 名医生,以更好地了解有限的电子病历互操作性。大多数参与者表示,英国国家医疗服务系统设施之间的互操作性不足。由此造成的后果包括住院时间延长、会诊时间延长以及进行了更多多余的诊断测试。有限的互操作性阻碍了国家医疗服务体系工作人员之间的交流,并威胁到医疗质量和患者安全。随着越来越多的医疗保健技术被纳入国家医疗服务体系,从医生那里获得更多的洞察力对于找到解决这些问题的方案至关重要。
{"title":"Physician experiences of electronic health records interoperability and its practical impact on care delivery in the English NHS: A cross-sectional survey study","authors":"Edmond Li, Olivia Lounsbury, Mujtaba Hasnain, Hutan Ashrafian, Ara Darzi, Ana Luisa Neves, Jonathan Clarke","doi":"10.1101/2024.07.25.24311018","DOIUrl":"https://doi.org/10.1101/2024.07.25.24311018","url":null,"abstract":"Abstract Background: The lack of interoperability has been a well-recognised limitation associated with the use of electronic health records (EHR). However, less is known about how it manifests for frontline NHS staff when delivering care, how it impacts patient care, and what are its implications on care efficiency. Objectives: (1) To capture the perceptions of physicians regarding the current state of EHRs interoperability, (2) to investigate how poor interoperability affects patient care and safety and (3) to determine the effects on care efficiency in the NHS. Methods: An online survey was conducted to explore how physicians perceived the routine use of EHRs, its effects on patient safety, and impact to care efficiency in NHS healthcare facilities. Descriptive statistics was used to report any notable findings observed. Results: A total of 636 NHS physicians participated. Participants reported that EHR interoperability is rudimentary across much of the NHS, with limited ability to read but not edit data from within their organisation. Negative perceptions were most pronounced amongst specialties in secondary care settings and those with less than one year of EHR experience or lower self-reported EHR skills. Limited interoperability prolonged hospital stays, lengthened consultation times, and frequently necessitated repeat investigations to be performed. Limited EHR interoperability impaired physician access to clinical data, hampered communication between providers, and was perceived to threatened patient safety. Conclusion: As healthcare data continues to increase in complexity and volume, EHR interoperability must evolve to accommodate these growing changes and ensure the continued delivery of safe care. The experiences of physicians provide valuable insight into the practical challenges limited interoperability poses and can contribute to future policy solutions to better integrate EHRs in the clinical environment. Public Interest/Lay Summary Limited interoperability between EHR systems has been a longstanding problem since the technology's introduction in NHS England. However, little research has been done to understand the extent of this problem from the perspective of physicians and the challenges it poses. This study surveyed 636 physicians across England to better understand limited EHR interoperability. Most participants reported that interoperability between NHS facilities was inadequate. Consequences of this included increased duration of hospital stays, lengthened consultation times, and more redundant diagnostic tests performed. Limited interoperability hindered communication between NHS workers and threatened care quality and patient safety. As more healthcare technologies are incorporated into the NHS, gaining greater insight from physicians is critical to finding solutions to address these problems.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing the Factor Structure and Sleep Quality of Pittsburgh Sleep Quality Index in Indian Information Technology Sector 分析印度信息技术部门匹兹堡睡眠质量指数的因子结构和睡眠质量
Pub Date : 2024-07-26 DOI: 10.1101/2024.07.25.24308199
Arindam Chatterjee, Rimu Chaudhuri, Arijit Dutta
The Pittsburgh Sleep Quality Index (PSQI) has gained widespread acceptance as a useful tool to measure sleep quality. In order to formulate the diagnosis process, it is essential that we understand the factor structure inherent in the PSQI data. In this work, we seek to estimate such a structure with a focus on the Indian Information Technology (IT) workers. We have used Confirmatory Factor Analysis (CFA) and the Exploratory Factor Analysis (EFA) for this purpose. We have also used the Multi layer perceptron based method to see how we can classify the sleep quality of the sampled population. We have discovered that, contrary to the general perception, most Indian IT employees have sleep quality belonging to good and very good classes.
匹兹堡睡眠质量指数(PSQI)作为测量睡眠质量的有效工具已被广泛接受。为了制定诊断程序,我们必须了解 PSQI 数据的内在因素结构。在这项工作中,我们试图以印度信息技术(IT)工作者为重点,对这种结构进行估计。为此,我们使用了确认性因子分析(CFA)和探索性因子分析(EFA)。我们还使用了基于多层感知器的方法来了解如何对抽样人群的睡眠质量进行分类。我们发现,与一般看法相反,大多数印度 IT 员工的睡眠质量属于良好和非常好的级别。
{"title":"Analyzing the Factor Structure and Sleep Quality of Pittsburgh Sleep Quality Index in Indian Information Technology Sector","authors":"Arindam Chatterjee, Rimu Chaudhuri, Arijit Dutta","doi":"10.1101/2024.07.25.24308199","DOIUrl":"https://doi.org/10.1101/2024.07.25.24308199","url":null,"abstract":"The Pittsburgh Sleep Quality Index (PSQI) has gained widespread acceptance as a useful tool to measure sleep quality. In order to formulate the diagnosis process, it is essential that we understand the factor structure inherent in the PSQI data. In this work, we seek to estimate such a structure with a focus on the Indian Information Technology (IT) workers. We have used Confirmatory Factor Analysis (CFA) and the Exploratory Factor Analysis (EFA) for this purpose. We have also used the Multi layer perceptron based method to see how we can classify the sleep quality of the sampled population. We have discovered that, contrary to the general perception, most Indian IT employees have sleep quality belonging to good and very good classes.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"125 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of the Clinical Utility of DxGPT, a GPT-4 Based Large Language Model, through an Analysis of Diagnostic Accuracy and User Experience 通过分析诊断准确性和用户体验评估基于 GPT-4 的大语言模型 DxGPT 的临床实用性
Pub Date : 2024-07-26 DOI: 10.1101/2024.07.23.24310847
Marina Alvarez-Estape, Ivan Cano, Rosa Pino, Carla González Grado, Andrea Aldemira-Liz, Javier Gonzálvez-Ortuño, Juanjo do Olmo, Javier Logroño, Marcelo Martínez, Carlos Mascías, Julián Isla, Jordi Martínez Roldán, Cristian Launes, Francesc Garcia-Cuyas, Paula Esteller-Cucala
Importance The time to accurately diagnose rare pediatric diseases often spans years. Assessing the diagnostic accuracy of an LLM-based tool on real pediatric cases can help reduce this time, providing quicker diagnoses for patients and their families. Objective To evaluate the clinical utility of DxGPT as a support tool for differential diagnosis of both common and rare diseases. Design Unicentric descriptive cross-sectional exploratory study. Anonymized data from 50 pediatric patients' medical histories, covering common and rare pathologies, were used to generate clinical case notes. Each clinical case included essential data, with some expanded by complementary tests. Setting This study was conducted at a reference pediatric hospital, Sant Joan de Déu Barcelona Children′s Hospital. Participants A total of 50 clinical cases were diagnosed by 78 volunteer doctors (medical diagnostic team) with varying experience, each reviewing 3 clinical cases. Interventions Each clinician listed up to five diagnoses per clinical case note. The same was done on the DxGPT web platform, obtaining the Top-5 diagnostic proposals. To evaluate DxGPT's variability, each note was queried three times. Main Outcome(s) and Measure(s) The study mainly focused on comparing diagnostic accuracy, defined as the percentage of cases with the correct diagnosis, between the medical diagnostic team and DxGPT. Other evaluation criteria included qualitative assessments. The medical diagnostic team also completed a survey on their user experience with DxGPT.Results Top-5 diagnostic accuracy was 65% for clinicians and 60% for DxGPT, with no significant differences. Accuracies for common diseases were higher (Clinicians: 79%, DxGPT: 71%) than for rare diseases (Clinicians: 50%, DxGPT: 49%). Accuracy increased similarly in both groups with expanded information, but this increase was only stastically significant in clinicians (simple 52% vs. expanded 69%; p=0.03). DxGPT′s response variability affected less than 5% of clinical case notes. A survey of 48 clinicians rated the DxGPT platform 3.9/5 overall, 4.1/5 for usefulness, and 4.5/5 for usability. Conclusions and Relevance DxGPT showed diagnostic accuracies similar to medical staff from a pediatric hospital, indicating its potential for supporting differential diagnosis in other settings. Clinicians praised its usability and simplicity. These tools could provide new insights for challenging diagnostic cases.
重要性 准确诊断罕见儿科疾病往往需要数年时间。在真实儿科病例中评估基于 LLM 的工具的诊断准确性有助于缩短诊断时间,为患者及其家属提供更快的诊断。目标 评估 DxGPT 作为常见病和罕见病鉴别诊断辅助工具的临床实用性。设计 单中心描述性横断面探索性研究。使用 50 名儿科患者的匿名病史数据生成临床病例记录,涵盖常见病和罕见病。每个临床病例都包括基本数据,其中一些数据还通过补充检验进行了扩展。研究地点 本研究在一家儿科参考医院--巴塞罗那 Sant Joan de Déu 儿童医院进行。参与者 78名经验各异的志愿医生(医疗诊断小组)共诊断了50个临床病例,每人审查3个临床病例。干预措施 每位临床医生在每份临床病例记录中最多列出 5 项诊断。在 DxGPT 网络平台上也进行了同样的操作,获得了前 5 项诊断建议。为评估 DxGPT 的可变性,每份病例记录被查询三次。主要结果和衡量标准 该研究主要侧重于比较医疗诊断团队和 DxGPT 的诊断准确性,即诊断正确的病例百分比。其他评估标准包括定性评估。医疗诊断团队还完成了一份关于 DxGPT 用户体验的调查。常见疾病的准确率(临床医生:79%,DxGPT:71%)高于罕见疾病(临床医生:50%,DxGPT:49%)。扩充信息后,两组的准确率都有类似的提高,但只有临床医生的准确率有显著提高(简单 52% 对扩充 69%;P=0.03)。DxGPT 的反应变异对临床病例记录的影响不到 5%。对 48 名临床医生进行的调查显示,DxGPT 平台的总体评分为 3.9/5,实用性评分为 4.1/5,可用性评分为 4.5/5。结论和相关性 DxGPT 显示的诊断准确率与儿科医院的医务人员相似,这表明它在其他环境下支持鉴别诊断的潜力。临床医生对其可用性和简易性大加赞赏。这些工具可为具有挑战性的诊断病例提供新的见解。
{"title":"Evaluation of the Clinical Utility of DxGPT, a GPT-4 Based Large Language Model, through an Analysis of Diagnostic Accuracy and User Experience","authors":"Marina Alvarez-Estape, Ivan Cano, Rosa Pino, Carla González Grado, Andrea Aldemira-Liz, Javier Gonzálvez-Ortuño, Juanjo do Olmo, Javier Logroño, Marcelo Martínez, Carlos Mascías, Julián Isla, Jordi Martínez Roldán, Cristian Launes, Francesc Garcia-Cuyas, Paula Esteller-Cucala","doi":"10.1101/2024.07.23.24310847","DOIUrl":"https://doi.org/10.1101/2024.07.23.24310847","url":null,"abstract":"<strong>Importance</strong> The time to accurately diagnose rare pediatric diseases often spans years. Assessing the diagnostic accuracy of an LLM-based tool on real pediatric cases can help reduce this time, providing quicker diagnoses for patients and their families. <strong>Objective</strong> To evaluate the clinical utility of DxGPT as a support tool for differential diagnosis of both common and rare diseases. <strong>Design</strong> Unicentric descriptive cross-sectional exploratory study. Anonymized data from 50 pediatric patients' medical histories, covering common and rare pathologies, were used to generate clinical case notes. Each clinical case included essential data, with some expanded by complementary tests. <strong>Setting</strong> This study was conducted at a reference pediatric hospital, Sant Joan de Déu Barcelona Children′s Hospital. <strong>Participants</strong> A total of 50 clinical cases were diagnosed by 78 volunteer doctors (medical diagnostic team) with varying experience, each reviewing 3 clinical cases. <strong>Interventions</strong> Each clinician listed up to five diagnoses per clinical case note. The same was done on the DxGPT web platform, obtaining the Top-5 diagnostic proposals. To evaluate DxGPT's variability, each note was queried three times. <strong>Main Outcome(s) and Measure(s)</strong> The study mainly focused on comparing diagnostic accuracy, defined as the percentage of cases with the correct diagnosis, between the medical diagnostic team and DxGPT. Other evaluation criteria included qualitative assessments. The medical diagnostic team also completed a survey on their user experience with DxGPT.\u0000<strong>Results</strong> Top-5 diagnostic accuracy was 65% for clinicians and 60% for DxGPT, with no significant differences. Accuracies for common diseases were higher (Clinicians: 79%, DxGPT: 71%) than for rare diseases (Clinicians: 50%, DxGPT: 49%). Accuracy increased similarly in both groups with expanded information, but this increase was only stastically significant in clinicians (simple 52% vs. expanded 69%; p=0.03). DxGPT′s response variability affected less than 5% of clinical case notes. A survey of 48 clinicians rated the DxGPT platform 3.9/5 overall, 4.1/5 for usefulness, and 4.5/5 for usability. <strong>Conclusions and Relevance</strong> DxGPT showed diagnostic accuracies similar to medical staff from a pediatric hospital, indicating its potential for supporting differential diagnosis in other settings. Clinicians praised its usability and simplicity. These tools could provide new insights for challenging diagnostic cases.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"93 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-driven Integration of Multimodal Imaging Pixel Data and Genome-wide Genotype Data Enhances Precision Health for Type 2 Diabetes: Insights from a Large-scale Biobank Study 人工智能驱动的多模态成像像素数据与全基因组基因型数据的整合提高了 2 型糖尿病的精准医疗水平:大规模生物库研究的启示
Pub Date : 2024-07-26 DOI: 10.1101/2024.07.25.24310650
Yi-Jia Huang, Chun-houh Chen, Hsin-Chou Yang
The rising prevalence of Type 2 Diabetes (T2D) presents a critical global health challenge. Effective risk assessment and prevention strategies not only improve patient quality of life but also alleviate national healthcare expenditures. The integration of medical imaging and genetic data from extensive biobanks, driven by artificial intelligence (AI), is revolutionizing precision and smart health initiatives.In this study, we applied these principles to T2D by analyzing medical images (abdominal ultrasonography and bone density scans) alongside whole-genome single nucleotide variations in 17,785 Han Chinese participants from the Taiwan Biobank. Rigorous data cleaning and preprocessing procedures were applied. Imaging analysis utilized densely connected convolutional neural networks, augmented by graph neural networks to account for intra-individual image dependencies, while genetic analysis employed Bayesian statistical learning to derive polygenic risk scores (PRS). These modalities were integrated through eXtreme Gradient Boosting (XGBoost), yielding several key findings.First, pixel-based image analysis outperformed feature-centric image analysis in accuracy, automation, and cost efficiency. Second, multi-modality analysis significantly enhanced predictive accuracy compared to single-modality approaches. Third, this comprehensive approach, combining medical imaging, genetic, and demographic data, represents a promising frontier for fusion modeling, integrating AI and statistical learning techniques in disease risk assessment. Our model achieved an Area under the Receiver Operating Characteristic Curve (AUC) of 0.944, with an accuracy of 0.875, sensitivity of 0.882, specificity of 0.875, and a Youden index of 0.754. Additionally, the analysis revealed significant positive correlations between the multi-image risk score (MRS) and T2D, as well as between the PRS and T2D, identifying high-risk subgroups within the cohort.This study pioneers the integration of multimodal imaging pixels and genome-wide genetic variation data for precise T2D risk assessment, advancing the understanding of precision and smart health.
2 型糖尿病(T2D)发病率的不断上升给全球健康带来了严峻的挑战。有效的风险评估和预防策略不仅能提高患者的生活质量,还能减轻国家医疗开支。在本研究中,我们将这些原理应用于 T2D,分析了台湾生物库中 17785 名汉族参与者的医学影像(腹部超声波和骨密度扫描)以及全基因组单核苷酸变异。研究采用了严格的数据清理和预处理程序。成像分析采用了高密度连接的卷积神经网络,并通过图神经网络进行增强,以考虑个体内部的图像依赖性,而遗传分析则采用了贝叶斯统计学习方法,以得出多基因风险评分(PRS)。首先,基于像素的图像分析在准确性、自动化和成本效率方面都优于以特征为中心的图像分析。其次,与单一模式方法相比,多模式分析显著提高了预测准确性。第三,这种将医学影像、基因和人口统计学数据相结合的综合方法代表了融合建模的前沿领域,将人工智能和统计学习技术整合到了疾病风险评估中。我们的模型的接收者工作特征曲线下面积(AUC)为 0.944,准确度为 0.875,灵敏度为 0.882,特异度为 0.875,尤登指数为 0.754。这项研究开创性地将多模态成像像素和全基因组遗传变异数据整合在一起,用于精确的 T2D 风险评估,推动了对精准健康和智能健康的理解。
{"title":"AI-driven Integration of Multimodal Imaging Pixel Data and Genome-wide Genotype Data Enhances Precision Health for Type 2 Diabetes: Insights from a Large-scale Biobank Study","authors":"Yi-Jia Huang, Chun-houh Chen, Hsin-Chou Yang","doi":"10.1101/2024.07.25.24310650","DOIUrl":"https://doi.org/10.1101/2024.07.25.24310650","url":null,"abstract":"The rising prevalence of Type 2 Diabetes (T2D) presents a critical global health challenge. Effective risk assessment and prevention strategies not only improve patient quality of life but also alleviate national healthcare expenditures. The integration of medical imaging and genetic data from extensive biobanks, driven by artificial intelligence (AI), is revolutionizing precision and smart health initiatives.\u0000In this study, we applied these principles to T2D by analyzing medical images (abdominal ultrasonography and bone density scans) alongside whole-genome single nucleotide variations in 17,785 Han Chinese participants from the Taiwan Biobank. Rigorous data cleaning and preprocessing procedures were applied. Imaging analysis utilized densely connected convolutional neural networks, augmented by graph neural networks to account for intra-individual image dependencies, while genetic analysis employed Bayesian statistical learning to derive polygenic risk scores (PRS). These modalities were integrated through eXtreme Gradient Boosting (XGBoost), yielding several key findings.\u0000First, pixel-based image analysis outperformed feature-centric image analysis in accuracy, automation, and cost efficiency. Second, multi-modality analysis significantly enhanced predictive accuracy compared to single-modality approaches. Third, this comprehensive approach, combining medical imaging, genetic, and demographic data, represents a promising frontier for fusion modeling, integrating AI and statistical learning techniques in disease risk assessment. Our model achieved an Area under the Receiver Operating Characteristic Curve (AUC) of 0.944, with an accuracy of 0.875, sensitivity of 0.882, specificity of 0.875, and a Youden index of 0.754. Additionally, the analysis revealed significant positive correlations between the multi-image risk score (MRS) and T2D, as well as between the PRS and T2D, identifying high-risk subgroups within the cohort.\u0000This study pioneers the integration of multimodal imaging pixels and genome-wide genetic variation data for precise T2D risk assessment, advancing the understanding of precision and smart health.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use TRIPOD-LLM 声明:报告大型语言模型使用情况的目标指南
Pub Date : 2024-07-25 DOI: 10.1101/2024.07.24.24310930
Jack Gallifant, Majid Afshar, Saleem Ameen, Yindalon Aphinyanaphongs, Shan Chen, Giovanni Cacciamani, Dina Demner-Fushman, Dmitriy Dligach, Roxana Daneshjou, Chrystinne Fernandes, Lasse Hyldig Hansen, Adam Landman, Liam G. McCoy, Timothy Miller, Amy Moreno, Nikolaj Munch, David Restrepo, Guergana Savova, Renato Umeton, Judy Wawira Gichoya, Gary S. Collins, Karel G. M. Moons, Leo A. Celi, Danielle S. Bitterman
Large Language Models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present TRIPOD-LLM, an extension of the TRIPOD+AI statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight, and task-specific performance reporting. We also introduce an interactive website (https://tripod-llm.vercel.app/) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility, and clinical applicability of LLM research in healthcare through comprehensive reporting.
大型语言模型(LLM)正迅速被医疗保健领域所采用,因此需要标准化的报告指南。我们提出了 TRIPOD-LLM,它是 TRIPOD+AI 声明的扩展,旨在应对生物医学应用中 LLM 的独特挑战。TRIPOD-LLM 提供了一份包含 19 个主要项目和 50 个子项目的综合核对表,涵盖了从标题到讨论的关键方面。指南采用模块化格式,以适应各种 LLM 研究设计和任务,其中 14 个主要项目和 32 个子项目适用于所有类别。TRIPOD-LLM 是通过快速德尔菲程序和专家共识制定的,它强调透明度、人为监督和特定任务的绩效报告。我们还引入了一个交互式网站(https://tripod-llm.vercel.app/),方便用户填写指南和生成 PDF 文件以备提交。作为一份有生命力的文件,TRIPOD-LLM 将与该领域共同发展,旨在通过全面的报告提高医疗保健领域 LLM 研究的质量、可重复性和临床适用性。
{"title":"The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use","authors":"Jack Gallifant, Majid Afshar, Saleem Ameen, Yindalon Aphinyanaphongs, Shan Chen, Giovanni Cacciamani, Dina Demner-Fushman, Dmitriy Dligach, Roxana Daneshjou, Chrystinne Fernandes, Lasse Hyldig Hansen, Adam Landman, Liam G. McCoy, Timothy Miller, Amy Moreno, Nikolaj Munch, David Restrepo, Guergana Savova, Renato Umeton, Judy Wawira Gichoya, Gary S. Collins, Karel G. M. Moons, Leo A. Celi, Danielle S. Bitterman","doi":"10.1101/2024.07.24.24310930","DOIUrl":"https://doi.org/10.1101/2024.07.24.24310930","url":null,"abstract":"Large Language Models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present TRIPOD-LLM, an extension of the TRIPOD+AI statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight, and task-specific performance reporting. We also introduce an interactive website (https://tripod-llm.vercel.app/) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility, and clinical applicability of LLM research in healthcare through comprehensive reporting.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Individualized Machine-learning-based Clinical Assessment Recommendation System 基于机器学习的个性化临床评估推荐系统
Pub Date : 2024-07-24 DOI: 10.1101/2024.07.24.24310941
Devin R Setiawan, Yumiko Wiranto, Jeffrey M Girard, Amber Watts, Arian Ashourvan
Background: Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning framework that addresses the individualized feature addition problem and enhances diagnostic accuracy for clinical assessments.Methods: Individualized Clinical Assessment Recommendation System (iCARE) employs locally weighted logistic regression and Shapley Additive Explanations (SHAP) value analysis to tailor feature selection to individual patient characteristics. Evaluations were conducted on synthetic and real-world datasets, including early-stage diabetes risk prediction and heart failure clinical records from the UCI Machine Learning Repository. We compared the performance of iCARE with a Global approach using statistical analysis on accuracy and area under the ROC curve (AUC) to select the best additional features.Findings: The iCARE framework enhances predictive accuracy and AUC metrics when additional features exhibit distinct predictive capabilities, as evidenced by synthetic datasets 1-3 and the early diabetes dataset. Specifically, in synthetic dataset 1, iCARE achieved an accuracy of 0.999 and an AUC of 1.000, outperforming the Global approach with an accuracy of 0.689 and an AUC of 0.639. In the early diabetes dataset, iCARE shows improvements of 1.5-3.5% in accuracy and AUC across different numbers of initial features. Conversely, in synthetic datasets 4-5 and the heart failure dataset, where features lack discernible predictive distinctions, iCARE shows no significant advantage over global approaches on accuracy and AUC metrics.Interpretation: iCARE provides personalized feature recommendations that enhance diagnostic accuracy in scenarios where individualized approaches are critical, improving the precision and effectiveness of medical diagnoses.Funding: This work was supported by startup funding from the Department of Psychology at the University of Kansas provided to A.A., and the R01MH125740 award from NIH partially supported J.M.G.'s work.
背景:传统的临床评估往往缺乏个性化,依赖于标准化的程序,可能无法满足患者的不同需求,尤其是在早期阶段,个性化诊断可能会带来显著的益处。我们的目标是提供一个机器学习框架,解决个性化特征添加问题,提高临床评估的诊断准确性:方法:个体化临床评估推荐系统(iCARE)采用局部加权逻辑回归和夏普利相加解释(SHAP)值分析,根据患者个体特征进行特征选择。我们在合成数据集和真实数据集上进行了评估,包括早期糖尿病风险预测和来自 UCI 机器学习资料库的心力衰竭临床记录。我们使用准确率和 ROC 曲线下面积(AUC)统计分析比较了 iCARE 和全球方法的性能,以选择最佳附加特征:正如合成数据集 1-3 和早期糖尿病数据集所示,当附加特征表现出独特的预测能力时,iCARE 框架可提高预测准确率和 AUC 指标。具体来说,在合成数据集 1 中,iCARE 的准确率为 0.999,AUC 为 1.000,优于准确率为 0.689、AUC 为 0.639 的全局方法。在早期糖尿病数据集中,iCARE 在不同初始特征数量下的准确率和 AUC 提高了 1.5-3.5%。相反,在合成数据集 4-5 和心力衰竭数据集中,由于特征缺乏明显的预测区别,iCARE 在准确率和 AUC 指标上与全局方法相比没有明显优势。解释:iCARE 提供个性化特征推荐,在个性化方法至关重要的情况下提高了诊断准确率,改善了医疗诊断的精确性和有效性:这项工作得到了堪萨斯大学心理学系为A.A.提供的启动资金的支持,美国国立卫生研究院的R01MH125740奖励为J.M.G.的工作提供了部分支持。
{"title":"Individualized Machine-learning-based Clinical Assessment Recommendation System","authors":"Devin R Setiawan, Yumiko Wiranto, Jeffrey M Girard, Amber Watts, Arian Ashourvan","doi":"10.1101/2024.07.24.24310941","DOIUrl":"https://doi.org/10.1101/2024.07.24.24310941","url":null,"abstract":"Background: Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning framework that addresses the individualized feature addition problem and enhances diagnostic accuracy for clinical assessments.\u0000Methods: Individualized Clinical Assessment Recommendation System (iCARE) employs locally weighted logistic regression and Shapley Additive Explanations (SHAP) value analysis to tailor feature selection to individual patient characteristics. Evaluations were conducted on synthetic and real-world datasets, including early-stage diabetes risk prediction and heart failure clinical records from the UCI Machine Learning Repository. We compared the performance of iCARE with a Global approach using statistical analysis on accuracy and area under the ROC curve (AUC) to select the best additional features.\u0000Findings: The iCARE framework enhances predictive accuracy and AUC metrics when additional features exhibit distinct predictive capabilities, as evidenced by synthetic datasets 1-3 and the early diabetes dataset. Specifically, in synthetic dataset 1, iCARE achieved an accuracy of 0.999 and an AUC of 1.000, outperforming the Global approach with an accuracy of 0.689 and an AUC of 0.639. In the early diabetes dataset, iCARE shows improvements of 1.5-3.5% in accuracy and AUC across different numbers of initial features. Conversely, in synthetic datasets 4-5 and the heart failure dataset, where features lack discernible predictive distinctions, iCARE shows no significant advantage over global approaches on accuracy and AUC metrics.\u0000Interpretation: iCARE provides personalized feature recommendations that enhance diagnostic accuracy in scenarios where individualized approaches are critical, improving the precision and effectiveness of medical diagnoses.\u0000Funding: This work was supported by startup funding from the Department of Psychology at the University of Kansas provided to A.A., and the R01MH125740 award from NIH partially supported J.M.G.'s work.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"101 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Temperature Effects on Large Language Models Across Various Clinical Tasks 探索各种临床任务中温度对大型语言模型的影响
Pub Date : 2024-07-22 DOI: 10.1101/2024.07.22.24310824
Dhavalkumar Patel, Prem Timsina, Ganesh Raut, Robert Freeman, Matthew Levin, Girish Nadkarni, Benjamin S Glicksberg, Eyal Klang
Large Language Models (LLMs) are becoming integral to healthcare analytics. However, the influence of the temperature hyperparameter, which controls output randomness, remains poorly understood in clinical tasks. This study evaluates the effects of different temperature settings across various clinical tasks. We conducted a retrospective cohort study using electronic health records from the Mount Sinai Health System, collecting a random sample of 1283 patients from January to December 2023. Three LLMs (GPT-4, GPT-3.5, and Llama-3-70b) were tested at five temperature settings (0.2, 0.4, 0.6, 0.8, 1.0) for their ability to predict in-hospital mortality (binary classification), length of stay (regression), and the accuracy of medical coding (clinical reasoning). For mortality prediction, all models' accuracies were generally stable across different temperatures. Llama-3 showed the highest accuracy, around 90%, followed by GPT-4 (80-83%) and GPT-3.5 (74-76%). Regression analysis for predicting the length of stay showed that all models performed consistently across different temperatures. In the medical coding task, performance was also stable across temperatures, with GPT-4 achieving the highest accuracy at 17% for complete code accuracy. Our study demonstrates that LLMs maintain consistent accuracy across different temperature settings for varied clinical tasks, challenging the assumption that lower temperatures are necessary for clinical reasoning.
大型语言模型(LLM)正成为医疗分析不可或缺的一部分。然而,人们对控制输出随机性的温度超参数在临床任务中的影响仍然知之甚少。本研究评估了不同温度设置对各种临床任务的影响。我们利用西奈山医疗系统的电子健康记录进行了一项回顾性队列研究,收集了 2023 年 1 月至 12 月期间 1283 名患者的随机样本。我们在五种温度设置(0.2、0.4、0.6、0.8、1.0)下测试了三种 LLM(GPT-4、GPT-3.5 和 Llama-3-70b)预测院内死亡率(二元分类)、住院时间(回归)和医疗编码准确性(临床推理)的能力。在预测死亡率方面,所有模型的准确度在不同温度下基本保持稳定。Llama-3 的准确率最高,约为 90%,其次是 GPT-4(80-83%)和 GPT-3.5(74-76%)。预测住院时间的回归分析表明,所有模型在不同温度下的表现一致。在医疗编码任务中,不同温度下的表现也很稳定,其中 GPT-4 的准确率最高,达到了 17% 的完全编码准确率。我们的研究表明,在不同的临床任务中,LLMs 在不同的温度设置下都能保持稳定的准确性,这对 "临床推理需要较低温度 "的假设提出了挑战。
{"title":"Exploring Temperature Effects on Large Language Models Across Various Clinical Tasks","authors":"Dhavalkumar Patel, Prem Timsina, Ganesh Raut, Robert Freeman, Matthew Levin, Girish Nadkarni, Benjamin S Glicksberg, Eyal Klang","doi":"10.1101/2024.07.22.24310824","DOIUrl":"https://doi.org/10.1101/2024.07.22.24310824","url":null,"abstract":"Large Language Models (LLMs) are becoming integral to healthcare analytics. However, the influence of the temperature hyperparameter, which controls output randomness, remains poorly understood in clinical tasks. This study evaluates the effects of different temperature settings across various clinical tasks. We conducted a retrospective cohort study using electronic health records from the Mount Sinai Health System, collecting a random sample of 1283 patients from January to December 2023. Three LLMs (GPT-4, GPT-3.5, and Llama-3-70b) were tested at five temperature settings (0.2, 0.4, 0.6, 0.8, 1.0) for their ability to predict in-hospital mortality (binary classification), length of stay (regression), and the accuracy of medical coding (clinical reasoning). For mortality prediction, all models' accuracies were generally stable across different temperatures. Llama-3 showed the highest accuracy, around 90%, followed by GPT-4 (80-83%) and GPT-3.5 (74-76%). Regression analysis for predicting the length of stay showed that all models performed consistently across different temperatures. In the medical coding task, performance was also stable across temperatures, with GPT-4 achieving the highest accuracy at 17% for complete code accuracy. Our study demonstrates that LLMs maintain consistent accuracy across different temperature settings for varied clinical tasks, challenging the assumption that lower temperatures are necessary for clinical reasoning.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Potential of Wearable Technology in Early Stress Detection: A Multimodal Approach 评估可穿戴技术在早期压力检测中的潜力:多模式方法
Pub Date : 2024-07-21 DOI: 10.1101/2024.07.19.24310732
Basil A. Darwish, Nancy M. Salem, Ghada Kareem, Lamees N. Mahmoud, Ibrahim Sadek
Stress can adversely impact health, leading to issues like high blood pressure, heart diseases, and a compromised immune system. Consequently, using wearable devices to monitor stress is essential for prompt intervention and effective management. This study investigates the efficacy of wearable devices in the early detection of psychological stress, employing both binary and five-class classification models. Significant correlations were observed between stress levels and physiological signals, including Electrocardiogram (ECG), Electrodermal Activity (EDA), and Respiration (RESP), establishing these modalities as reliable biomarkers for stress detection. Utilizing the publicly available Wearable Stress and Affect Detection (WESAD) dataset, we employed two ensemble methods, Majority Voting (MV) and Weighted Averaging (WA), to integrate these signals, achieving maximum accuracies of 99.96% for binary classification and 99.59% for five-class classification. This integration significantly enhances the accuracy and robustness of the stress detection system. Furthermore, ten different classifiers were evaluated, and hyperparameter optimization and K-fold cross-validation ranging from 3-fold to 10-fold were applied. Both time-domain and frequency-domain features were examined separately. A review of commercially available wearable devices supporting these modalities was also conducted, resulting in recommendations for optimal configurations for practical applications. Our findings highlight the potential of multimodal wearable devices in advancing the early detection and continuous monitoring of psychological stress, with significant implications for future research and the development of improved stress detection systems.
压力会对健康产生不利影响,导致高血压、心脏病和免疫系统受损等问题。因此,使用可穿戴设备监测压力对于及时干预和有效管理至关重要。本研究采用二元分类和五元分类模型,研究了可穿戴设备在早期检测心理压力方面的功效。研究观察到压力水平与心电图(ECG)、皮电活动(EDA)和呼吸(RESP)等生理信号之间存在显著的相关性,从而确定这些模式是检测压力的可靠生物标记。利用公开的可穿戴压力和情绪检测(WESAD)数据集,我们采用了两种集合方法--多数表决法(MV)和加权平均法(WA)来整合这些信号,二元分类的最高准确率达到 99.96%,五元分类的最高准确率达到 99.59%。这种整合大大提高了压力检测系统的准确性和鲁棒性。此外,还对十种不同的分类器进行了评估,并应用了超参数优化和 3 倍至 10 倍的 K 倍交叉验证。对时域和频域特征分别进行了研究。我们还对支持这些模式的市售可穿戴设备进行了审查,从而为实际应用提出了最佳配置建议。我们的研究结果凸显了多模态可穿戴设备在推进心理压力的早期检测和持续监测方面的潜力,对未来研究和开发更好的压力检测系统具有重要意义。
{"title":"Evaluating the Potential of Wearable Technology in Early Stress Detection: A Multimodal Approach","authors":"Basil A. Darwish, Nancy M. Salem, Ghada Kareem, Lamees N. Mahmoud, Ibrahim Sadek","doi":"10.1101/2024.07.19.24310732","DOIUrl":"https://doi.org/10.1101/2024.07.19.24310732","url":null,"abstract":"Stress can adversely impact health, leading to issues like high blood pressure, heart diseases, and a compromised immune system. Consequently, using wearable devices to monitor stress is essential for prompt intervention and effective management. This study investigates the efficacy of wearable devices in the early detection of psychological stress, employing both binary and five-class classification models. Significant correlations were observed between stress levels and physiological signals, including Electrocardiogram (ECG), Electrodermal Activity (EDA), and Respiration (RESP), establishing these modalities as reliable biomarkers for stress detection. Utilizing the publicly available Wearable Stress and Affect Detection (WESAD) dataset, we employed two ensemble methods, Majority Voting (MV) and Weighted Averaging (WA), to integrate these signals, achieving maximum accuracies of 99.96% for binary classification and 99.59% for five-class classification. This integration significantly enhances the accuracy and robustness of the stress detection system. Furthermore, ten different classifiers were evaluated, and hyperparameter optimization and K-fold cross-validation ranging from 3-fold to 10-fold were applied. Both time-domain and frequency-domain features were examined separately. A review of commercially available wearable devices supporting these modalities was also conducted, resulting in recommendations for optimal configurations for practical applications. Our findings highlight the potential of multimodal wearable devices in advancing the early detection and continuous monitoring of psychological stress, with significant implications for future research and the development of improved stress detection systems.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141744328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
medRxiv - Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1