Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses From Closed-Domain Large Language Models Versus Clinical Teams

Mayo Clinic Proceedings. Digital health Pub Date : 2025-03-01 DOI:10.1016/j.mcpdig.2025.100198

Yuexing Hao MS , Jason Holmes PhD , Jared Hobson MD , Alexandra Bennett MD , Elizabeth L. McKone MD , Daniel K. Ebner MD , David M. Routman MD , Satomi Shiraishi MD , Samir H. Patel MD , Nathan Y. Yu MD , Chris L. Hallemeier MD , Brooke E. Ball MSN , Mark Waddle MD , Wei Liu PhD

{"title":"Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses From Closed-Domain Large Language Models Versus Clinical Teams","authors":"Yuexing Hao MS , Jason Holmes PhD , Jared Hobson MD , Alexandra Bennett MD , Elizabeth L. McKone MD , Daniel K. Ebner MD , David M. Routman MD , Satomi Shiraishi MD , Samir H. Patel MD , Nathan Y. Yu MD , Chris L. Hallemeier MD , Brooke E. Ball MSN , Mark Waddle MD , Wei Liu PhD","doi":"10.1016/j.mcpdig.2025.100198","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To evaluate the effectiveness of RadOnc-generative pretrained transformer (GPT), a GPT-4 based large language model, in assisting with in-basket message response generation for prostate cancer treatment, with the goal of reducing the workload and time on clinical care teams while maintaining response quality.</div></div><div><h3>Patients and Methods</h3><div>RadOnc-GPT was integrated with electronic health records from both Mayo Clinic-wide databases and a radiation-oncology-specific database. The model was evaluated on 158 previously recorded in-basket message interactions, selected from 90 patients with nonmetastatic prostate cancer from the Mayo Clinic Department of Radiation Oncology in-basket message database in the calendar years 2022-2024. Quantitative natural language processing analysis and 2 grading studies, conducted by 5 clinicians and 4 nurses, were used to assess RadOnc-GPT’s responses. Three primary clinicians independently graded all messages, whereas a fourth senior clinician reviewed 41 responses with relevant discrepancies, and a fifth senior clinician evaluated 2 additional responses. The grading focused on 5 key areas: completeness, correctness, clarity, empathy, and editing time. The grading study was performed from July 20, 2024 to December 15, 2024.</div></div><div><h3>Results</h3><div>The RadOnc-GPT slightly outperformed the clinical care team in empathy, whereas achieving comparable scores with the clinical care team in completeness, correctness, and clarity. Five clinician graders identified key limitations in RadOnc-GPT’s responses, such as lack of context, insufficient domain-specific knowledge, inability to perform essential meta-tasks, and hallucination. It was estimated that RadOnc-GPT could save an average of 5.2 minutes per message for nurses and 2.4 minutes for clinicians, from reading the inquiry to sending the response.</div></div><div><h3>Conclusion</h3><div>RadOnc-GPT has the potential to considerably reduce the workload of clinical care teams by generating high-quality, timely responses for in-basket message interactions. This could lead to improved efficiency in health care workflows and reduced costs while maintaining or enhancing the quality of communication between patients and health care providers.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 1","pages":"Article 100198"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mayo Clinic Proceedings. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949761225000057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

To evaluate the effectiveness of RadOnc-generative pretrained transformer (GPT), a GPT-4 based large language model, in assisting with in-basket message response generation for prostate cancer treatment, with the goal of reducing the workload and time on clinical care teams while maintaining response quality.

Patients and Methods

RadOnc-GPT was integrated with electronic health records from both Mayo Clinic-wide databases and a radiation-oncology-specific database. The model was evaluated on 158 previously recorded in-basket message interactions, selected from 90 patients with nonmetastatic prostate cancer from the Mayo Clinic Department of Radiation Oncology in-basket message database in the calendar years 2022-2024. Quantitative natural language processing analysis and 2 grading studies, conducted by 5 clinicians and 4 nurses, were used to assess RadOnc-GPT’s responses. Three primary clinicians independently graded all messages, whereas a fourth senior clinician reviewed 41 responses with relevant discrepancies, and a fifth senior clinician evaluated 2 additional responses. The grading focused on 5 key areas: completeness, correctness, clarity, empathy, and editing time. The grading study was performed from July 20, 2024 to December 15, 2024.

Results

The RadOnc-GPT slightly outperformed the clinical care team in empathy, whereas achieving comparable scores with the clinical care team in completeness, correctness, and clarity. Five clinician graders identified key limitations in RadOnc-GPT’s responses, such as lack of context, insufficient domain-specific knowledge, inability to perform essential meta-tasks, and hallucination. It was estimated that RadOnc-GPT could save an average of 5.2 minutes per message for nurses and 2.4 minutes for clinicians, from reading the inquiry to sending the response.

Conclusion

RadOnc-GPT has the potential to considerably reduce the workload of clinical care teams by generating high-quality, timely responses for in-basket message interactions. This could lead to improved efficiency in health care workflows and reduced costs while maintaining or enhancing the quality of communication between patients and health care providers.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

前列腺癌收件箱信息的回顾性比较分析：封闭大语言模型与临床团队的反应

目的评价基于GPT-4的大语言模型radonc - generated pretrained transformer （GPT）在辅助前列腺癌治疗的in-basket消息响应生成中的有效性，以减少临床护理团队的工作量和时间，同时保持响应质量。患者和方法radonc - gpt与来自梅奥诊所数据库和放射肿瘤学特定数据库的电子健康记录集成。该模型是根据之前记录的158个信息包交互进行评估的，这些信息包交互是从梅奥诊所放射肿瘤科信息包数据库中选出的90名非转移性前列腺癌患者，时间为2022-2024年。5名临床医生和4名护士进行了定量自然语言处理分析和2项评分研究，用于评估RadOnc-GPT的反应。三位主要临床医生独立地对所有信息进行评分，而第四位高级临床医生审查了41个相关差异的反馈，第五位高级临床医生评估了另外2个反馈。评分主要集中在5个关键领域：完整性、正确性、清晰度、同理心和编辑时间。分级研究时间为2024年7月20日至2024年12月15日。结果RadOnc-GPT在共情方面略优于临床护理组，而在完整性、正确性和清晰度方面与临床护理组得分相当。五名临床医生评分人员指出了RadOnc-GPT反应的主要局限性，如缺乏背景、领域特定知识不足、无法执行基本元任务和幻觉。据估计，从阅读问询到发送回复，RadOnc-GPT平均每条信息为护士节省5.2分钟，为临床医生节省2.4分钟。结论radonc - gpt通过生成高质量、及时的收件箱信息交互响应，有可能大大减少临床护理团队的工作量。这可以提高卫生保健工作流程的效率并降低成本，同时保持或提高患者与卫生保健提供者之间的沟通质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Mayo Clinic Proceedings. Digital health Medicine and Dentistry (General), Health Informatics, Public Health and Health Policy

自引率

0.00%

发文量

审稿时长

47 days