大语言模型对管理推理的影响:随机对照试验

Ethan Ethan, Robert Gallo, Eric Strong, Yingjie Weng, Hannah Kerman, Jason Freed, Josephine A Cool, Zahir Kanjee, Kathleen Lane, Andrew S Parsons, Neera Ahuja, Eric Horvitz, Daniel Yang, Arnold Milstein, Andrew PJ Olson, Jason Hom, Jonathan H. Chen, Adam Rodman
{"title":"大语言模型对管理推理的影响:随机对照试验","authors":"Ethan Ethan, Robert Gallo, Eric Strong, Yingjie Weng, Hannah Kerman, Jason Freed, Josephine A Cool, Zahir Kanjee, Kathleen Lane, Andrew S Parsons, Neera Ahuja, Eric Horvitz, Daniel Yang, Arnold Milstein, Andrew PJ Olson, Jason Hom, Jonathan H. Chen, Adam Rodman","doi":"10.1101/2024.08.05.24311485","DOIUrl":null,"url":null,"abstract":"Importance: Large language model (LLM) artificial intelligence (AI) systems have shown promise in diagnostic reasoning, but their utility in management reasoning with no clear right answers is unknown.\nObjective: To determine whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources.\nDesign: Prospective, randomized controlled trial conducted from 30 November 2023 to 21 April 2024.\nSetting: Multi-institutional study from Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia involving physicians from across the United States.\nParticipants: 92 practicing attending physicians and residents with training in internal medicine, family medicine, or emergency medicine. Intervention: Five expert-developed clinical case vignettes were presented with multiple open-ended management questions and scoring rubrics created through a Delphi process. Physicians were randomized to use either GPT-4 via ChatGPT Plus in addition to conventional resources (e.g., UpToDate, Google), or conventional resources alone.\nMain Outcomes and Measures: The primary outcome was difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case.\nResults: Physicians using the LLM scored higher compared to those using conventional resources (mean difference 6.5 %, 95% CI 2.7-10.2, p<0.001). Significant improvements were seen in management decisions (6.1%, 95% CI 2.5-9.7, p=0.001), diagnostic decisions (12.1%, 95% CI 3.1-21.0, p=0.009), and case-specific (6.2%, 95% CI 2.4-9.9, p=0.002) domains. GPT-4 users spent more time per case (mean difference 119.3 seconds, 95% CI 17.4-221.2, p=0.02). There was no significant difference between GPT-4-augmented physicians and GPT-4 alone (-0.9%, 95% CI -9.0 to 7.2, p=0.8).\nConclusions and Relevance: LLM assistance improved physician management reasoning compared to conventional resources, with particular gains in contextual and patient-specific decision-making. These findings indicate that LLMs can augment management decision-making in complex cases. Trial Registration ClinicalTrials.gov Identifier: NCT06208423; https://classic.clinicaltrials.gov/ct2/show/NCT06208423","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"369 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large Language Model Influence on Management Reasoning: A Randomized Controlled Trial\",\"authors\":\"Ethan Ethan, Robert Gallo, Eric Strong, Yingjie Weng, Hannah Kerman, Jason Freed, Josephine A Cool, Zahir Kanjee, Kathleen Lane, Andrew S Parsons, Neera Ahuja, Eric Horvitz, Daniel Yang, Arnold Milstein, Andrew PJ Olson, Jason Hom, Jonathan H. Chen, Adam Rodman\",\"doi\":\"10.1101/2024.08.05.24311485\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Importance: Large language model (LLM) artificial intelligence (AI) systems have shown promise in diagnostic reasoning, but their utility in management reasoning with no clear right answers is unknown.\\nObjective: To determine whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources.\\nDesign: Prospective, randomized controlled trial conducted from 30 November 2023 to 21 April 2024.\\nSetting: Multi-institutional study from Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia involving physicians from across the United States.\\nParticipants: 92 practicing attending physicians and residents with training in internal medicine, family medicine, or emergency medicine. Intervention: Five expert-developed clinical case vignettes were presented with multiple open-ended management questions and scoring rubrics created through a Delphi process. Physicians were randomized to use either GPT-4 via ChatGPT Plus in addition to conventional resources (e.g., UpToDate, Google), or conventional resources alone.\\nMain Outcomes and Measures: The primary outcome was difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case.\\nResults: Physicians using the LLM scored higher compared to those using conventional resources (mean difference 6.5 %, 95% CI 2.7-10.2, p<0.001). Significant improvements were seen in management decisions (6.1%, 95% CI 2.5-9.7, p=0.001), diagnostic decisions (12.1%, 95% CI 3.1-21.0, p=0.009), and case-specific (6.2%, 95% CI 2.4-9.9, p=0.002) domains. GPT-4 users spent more time per case (mean difference 119.3 seconds, 95% CI 17.4-221.2, p=0.02). There was no significant difference between GPT-4-augmented physicians and GPT-4 alone (-0.9%, 95% CI -9.0 to 7.2, p=0.8).\\nConclusions and Relevance: LLM assistance improved physician management reasoning compared to conventional resources, with particular gains in contextual and patient-specific decision-making. These findings indicate that LLMs can augment management decision-making in complex cases. Trial Registration ClinicalTrials.gov Identifier: NCT06208423; https://classic.clinicaltrials.gov/ct2/show/NCT06208423\",\"PeriodicalId\":501454,\"journal\":{\"name\":\"medRxiv - Health Informatics\",\"volume\":\"369 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.08.05.24311485\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.05.24311485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

重要性:大型语言模型(LLM)人工智能(AI)系统在诊断推理中大有可为,但其在没有明确正确答案的管理推理中的实用性尚不得而知:与传统资源相比,确定 LLM 辅助是否能提高医生在开放式管理推理任务中的表现:设计:2023年11月30日至2024年4月21日进行的前瞻性随机对照试验:来自斯坦福大学、贝斯以色列女执事医疗中心和弗吉尼亚大学的多机构研究,涉及美国各地的医生。参与者:92 名接受过内科、家庭医学或急诊医学培训的执业主治医师和住院医师。干预措施:五个由专家开发的临床病例小故事中包含多个开放式管理问题,以及通过德尔菲流程创建的评分标准。医生被随机分配在使用传统资源(如 UpToDate、Google)的同时通过 ChatGPT Plus 使用 GPT-4,或仅使用传统资源:主要结果是各组在专家开发的评分标准上的总分差异。次要结果包括特定领域得分和每个病例花费的时间:与使用传统资源的医生相比,使用 LLM 的医生得分更高(平均差异为 6.5%,95% CI 为 2.7-10.2,p<0.001)。在管理决策(6.1%,95% CI 2.5-9.7,p=0.001)、诊断决策(12.1%,95% CI 3.1-21.0,p=0.009)和特定病例(6.2%,95% CI 2.4-9.9,p=0.002)方面均有显著改善。GPT-4 用户在每个病例上花费的时间更长(平均差异 119.3 秒,95% CI 17.4-221.2,p=0.02)。GPT-4增强型医生与GPT-4单独型医生之间没有明显差异(-0.9%,95% CI -9.0至7.2,p=0.8):与传统资源相比,LLM 辅助提高了医生的管理推理能力,尤其是在针对具体情况和患者的决策方面。这些研究结果表明,LLM 可以增强复杂病例的管理决策。试验注册 ClinicalTrials.gov Identifier:NCT06208423; https://classic.clinicaltrials.gov/ct2/show/NCT06208423
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Large Language Model Influence on Management Reasoning: A Randomized Controlled Trial
Importance: Large language model (LLM) artificial intelligence (AI) systems have shown promise in diagnostic reasoning, but their utility in management reasoning with no clear right answers is unknown. Objective: To determine whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources. Design: Prospective, randomized controlled trial conducted from 30 November 2023 to 21 April 2024. Setting: Multi-institutional study from Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia involving physicians from across the United States. Participants: 92 practicing attending physicians and residents with training in internal medicine, family medicine, or emergency medicine. Intervention: Five expert-developed clinical case vignettes were presented with multiple open-ended management questions and scoring rubrics created through a Delphi process. Physicians were randomized to use either GPT-4 via ChatGPT Plus in addition to conventional resources (e.g., UpToDate, Google), or conventional resources alone. Main Outcomes and Measures: The primary outcome was difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case. Results: Physicians using the LLM scored higher compared to those using conventional resources (mean difference 6.5 %, 95% CI 2.7-10.2, p<0.001). Significant improvements were seen in management decisions (6.1%, 95% CI 2.5-9.7, p=0.001), diagnostic decisions (12.1%, 95% CI 3.1-21.0, p=0.009), and case-specific (6.2%, 95% CI 2.4-9.9, p=0.002) domains. GPT-4 users spent more time per case (mean difference 119.3 seconds, 95% CI 17.4-221.2, p=0.02). There was no significant difference between GPT-4-augmented physicians and GPT-4 alone (-0.9%, 95% CI -9.0 to 7.2, p=0.8). Conclusions and Relevance: LLM assistance improved physician management reasoning compared to conventional resources, with particular gains in contextual and patient-specific decision-making. These findings indicate that LLMs can augment management decision-making in complex cases. Trial Registration ClinicalTrials.gov Identifier: NCT06208423; https://classic.clinicaltrials.gov/ct2/show/NCT06208423
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A case is not a case is not a case - challenges and solutions in determining urolithiasis caseloads using the digital infrastructure of a clinical data warehouse Reliable Online Auditory Cognitive Testing: An observational study Federated Multiple Imputation for Variables that Are Missing Not At Random in Distributed Electronic Health Records Characterizing the connection between Parkinson's disease progression and healthcare utilization Generative AI and Large Language Models in Reducing Medication Related Harm and Adverse Drug Events - A Scoping Review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1