Prompt matters: evaluation of large language model chatbot responses related to Peyronie's disease.

IF 2.6 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL Sexual Medicine Pub Date : 2024-09-09 eCollection Date: 2024-08-01 DOI:10.1093/sexmed/qfae055
Christopher J Warren, Victoria S Edmonds, Nicolette G Payne, Sandeep Voletti, Sarah Y Wu, JennaKay Colquitt, Hossein Sadeghi-Nejad, Nahid Punjani
{"title":"Prompt matters: evaluation of large language model chatbot responses related to Peyronie's disease.","authors":"Christopher J Warren, Victoria S Edmonds, Nicolette G Payne, Sandeep Voletti, Sarah Y Wu, JennaKay Colquitt, Hossein Sadeghi-Nejad, Nahid Punjani","doi":"10.1093/sexmed/qfae055","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Despite direct access to clinicians through the electronic health record, patients are increasingly turning to the internet for information related to their health, especially with sensitive urologic conditions such as Peyronie's disease (PD). Large language model (LLM) chatbots are a form of artificial intelligence that rely on user prompts to mimic conversation, and they have shown remarkable capabilities. The conversational nature of these chatbots has the potential to answer patient questions related to PD; however, the accuracy, comprehensiveness, and readability of these LLMs related to PD remain unknown.</p><p><strong>Aims: </strong>To assess the quality and readability of information generated from 4 LLMs with searches related to PD; to see if users could improve responses; and to assess the accuracy, completeness, and readability of responses to artificial preoperative patient questions sent through the electronic health record prior to undergoing PD surgery.</p><p><strong>Methods: </strong>The National Institutes of Health's frequently asked questions related to PD were entered into 4 LLMs, unprompted and prompted. The responses were evaluated for overall quality by the previously validated DISCERN questionnaire. Accuracy and completeness of LLM responses to 11 presurgical patient messages were evaluated with previously accepted Likert scales. All evaluations were performed by 3 independent reviewers in October 2023, and all reviews were repeated in April 2024. Descriptive statistics and analysis were performed.</p><p><strong>Results: </strong>Without prompting, the quality of information was moderate across all LLMs but improved to high quality with prompting. LLMs were accurate and complete, with an average score of 5.5 of 6.0 (SD, 0.8) and 2.8 of 3.0 (SD, 0.4), respectively. The average Flesch-Kincaid reading level was grade 12.9 (SD, 2.1). Chatbots were unable to communicate at a grade 8 reading level when prompted, and their citations were appropriate only 42.5% of the time.</p><p><strong>Conclusion: </strong>LLMs may become a valuable tool for patient education for PD, but they currently rely on clinical context and appropriate prompting by humans to be useful. Unfortunately, their prerequisite reading level remains higher than that of the average patient, and their citations cannot be trusted. However, given their increasing uptake and accessibility, patients and physicians should be educated on how to interact with these LLMs to elicit the most appropriate responses. In the future, LLMs may reduce burnout by helping physicians respond to patient messages.</p>","PeriodicalId":21782,"journal":{"name":"Sexual Medicine","volume":"12 4","pages":"qfae055"},"PeriodicalIF":2.6000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11384107/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sexual Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/sexmed/qfae055","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Despite direct access to clinicians through the electronic health record, patients are increasingly turning to the internet for information related to their health, especially with sensitive urologic conditions such as Peyronie's disease (PD). Large language model (LLM) chatbots are a form of artificial intelligence that rely on user prompts to mimic conversation, and they have shown remarkable capabilities. The conversational nature of these chatbots has the potential to answer patient questions related to PD; however, the accuracy, comprehensiveness, and readability of these LLMs related to PD remain unknown.

Aims: To assess the quality and readability of information generated from 4 LLMs with searches related to PD; to see if users could improve responses; and to assess the accuracy, completeness, and readability of responses to artificial preoperative patient questions sent through the electronic health record prior to undergoing PD surgery.

Methods: The National Institutes of Health's frequently asked questions related to PD were entered into 4 LLMs, unprompted and prompted. The responses were evaluated for overall quality by the previously validated DISCERN questionnaire. Accuracy and completeness of LLM responses to 11 presurgical patient messages were evaluated with previously accepted Likert scales. All evaluations were performed by 3 independent reviewers in October 2023, and all reviews were repeated in April 2024. Descriptive statistics and analysis were performed.

Results: Without prompting, the quality of information was moderate across all LLMs but improved to high quality with prompting. LLMs were accurate and complete, with an average score of 5.5 of 6.0 (SD, 0.8) and 2.8 of 3.0 (SD, 0.4), respectively. The average Flesch-Kincaid reading level was grade 12.9 (SD, 2.1). Chatbots were unable to communicate at a grade 8 reading level when prompted, and their citations were appropriate only 42.5% of the time.

Conclusion: LLMs may become a valuable tool for patient education for PD, but they currently rely on clinical context and appropriate prompting by humans to be useful. Unfortunately, their prerequisite reading level remains higher than that of the average patient, and their citations cannot be trusted. However, given their increasing uptake and accessibility, patients and physicians should be educated on how to interact with these LLMs to elicit the most appropriate responses. In the future, LLMs may reduce burnout by helping physicians respond to patient messages.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
提示事项:评估与佩罗尼氏病有关的大型语言模型聊天机器人回复。
导言:尽管患者可以通过电子健康记录直接与临床医生联系,但他们越来越多地转向互联网获取与自身健康相关的信息,尤其是像佩罗尼氏病(PD)这样敏感的泌尿科疾病。大型语言模型(LLM)聊天机器人是一种人工智能,它依靠用户提示来模仿对话,并已显示出非凡的能力。这些聊天机器人的对话性质有可能回答患者提出的与帕金森病有关的问题;然而,这些与帕金森病有关的大型语言模型的准确性、全面性和可读性仍是未知数。目的:评估通过与帕金森病有关的搜索从 4 个大型语言模型生成的信息的质量和可读性;了解用户是否可以改进回复;评估在接受帕金森病手术前通过电子健康记录发送的人工术前患者问题回复的准确性、完整性和可读性:方法:将美国国立卫生研究院与腹腔镜手术相关的常见问题输入 4 个 LLM,包括无提示和有提示两种情况。回答的整体质量由之前验证过的 DISCERN 问卷进行评估。LLM 对 11 条术前患者信息回复的准确性和完整性采用之前认可的李克特量表进行评估。所有评估均由 3 位独立审查员于 2023 年 10 月进行,并于 2024 年 4 月再次进行审查。对结果进行了描述性统计和分析:在没有提示的情况下,所有 LLM 的信息质量都处于中等水平,但在有提示的情况下,信息质量提高到了较高水平。LLM 的准确性和完整性分别为 6.0 分中的 5.5 分(SD,0.8)和 3.0 分中的 2.8 分(SD,0.4)。Flesch-Kincaid 阅读水平平均为 12.9 级(标准差为 2.1)。聊天机器人在收到提示时无法以 8 年级的阅读水平进行交流,其引文只有 42.5% 的时间是恰当的:LLMs可能会成为对PD患者进行教育的重要工具,但它们目前需要依赖临床环境和人类的适当提示才能发挥作用。遗憾的是,LLMs 的前提阅读水平仍然高于普通患者,其引文也不可信。不过,鉴于它们的使用率和可访问性越来越高,应该教育病人和医生如何与这些 LLMs 互动,以获得最合适的反应。未来,LLMs 可能会通过帮助医生回应病人的信息来减少职业倦怠。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Sexual Medicine
Sexual Medicine MEDICINE, GENERAL & INTERNAL-
CiteScore
5.40
自引率
0.00%
发文量
103
审稿时长
22 weeks
期刊介绍: Sexual Medicine is an official publication of the International Society for Sexual Medicine, and serves the field as the peer-reviewed, open access journal for rapid dissemination of multidisciplinary clinical and basic research in all areas of global sexual medicine, and particularly acts as a venue for topics of regional or sub-specialty interest. The journal is focused on issues in clinical medicine and epidemiology but also publishes basic science papers with particular relevance to specific populations. Sexual Medicine offers clinicians and researchers a rapid route to publication and the opportunity to publish in a broadly distributed and highly visible global forum. The journal publishes high quality articles from all over the world and actively seeks submissions from countries with expanding sexual medicine communities. Sexual Medicine relies on the same expert panel of editors and reviewers as The Journal of Sexual Medicine and Sexual Medicine Reviews.
期刊最新文献
Extracorporeal shock wave therapy as a treatment option for persistent clitoral priapism: a case report. Global web trends analysis of sex toys. Women's experiences of female ejaculation and/or squirting: a Swedish cross-sectional study. Letter to the Editor on "Causal relationships between immune cells and erectile dysfunction based on Mendelian randomization". Response to Letter to the Editor on "Causal relationships between immune cells and erectile dysfunction based on Mendelian randomization".
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1