Quality of Chatbot Information Related to Benign Prostatic Hyperplasia.

IF 2.6 3区 医学 Q3 ENDOCRINOLOGY & METABOLISM Prostate Pub Date : 2024-11-08 DOI:10.1002/pros.24814
Christopher J Warren, Nicolette G Payne, Victoria S Edmonds, Sandeep S Voleti, Mouneeb M Choudry, Nahid Punjani, Haider M Abdul-Muhsin, Mitchell R Humphreys
{"title":"Quality of Chatbot Information Related to Benign Prostatic Hyperplasia.","authors":"Christopher J Warren, Nicolette G Payne, Victoria S Edmonds, Sandeep S Voleti, Mouneeb M Choudry, Nahid Punjani, Haider M Abdul-Muhsin, Mitchell R Humphreys","doi":"10.1002/pros.24814","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language model (LLM) chatbots, a form of artificial intelligence (AI) that excels at prompt-based interactions and mimics human conversation, have emerged as a tool for providing patients with information about urologic conditions. We aimed to examine the quality of information related to benign prostatic hyperplasia surgery from four chatbots and how they would respond to sample patient messages.</p><p><strong>Methods: </strong>We identified the top three queries in Google Trends related to \"treatment for enlarged prostate.\" These were entered into ChatGPT (OpenAI), Bard (Google), Bing AI (Microsoft), and Doximity GPT (Doximity), both unprompted and prompted for specific criteria (optimized). The chatbot-provided answers to each query were evaluated for overall quality by three urologists using the DISCERN instrument. Readability was measured with the built-in Flesch-Kincaid reading level tool in Microsoft Word. To assess the ability of chatbots to answer patient questions, we prompted the chatbots with a clinical scenario related to holmium laser enucleation of the prostate, followed by 10 questions that the National Institutes of Health recommends patients ask before surgery. Accuracy and completeness of responses were graded with Likert scales.</p><p><strong>Results: </strong>Without prompting, the quality of information was moderate across all chatbots but improved significantly with prompting (mean [SD], 3.3 [1.2] vs. 4.4 [0.7] out of 5; p < 0.001). When answering simulated patient messages, the chatbots were accurate (mean [SD], 5.6 [0.4] out of 6) and complete (mean [SD], 2.8 [0.3] out of 3). Additionally, 98% (39/40) had a median score of 5 or higher for accuracy, which corresponds to \"nearly all correct.\" The readability was poor, with a mean (SD) Flesch-Kincaid reading level grade of 12.1 (1.3) (unprompted).</p><p><strong>Conclusions: </strong>LLM chatbots hold promise for patient education, but their effectiveness is limited by the need for careful prompting from the user and by responding at a reading level higher than that of most Americans (grade 8). Educating patients and physicians on optimal LLM interaction is crucial to unlock the full potential of chatbots.</p>","PeriodicalId":54544,"journal":{"name":"Prostate","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prostate","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/pros.24814","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Large language model (LLM) chatbots, a form of artificial intelligence (AI) that excels at prompt-based interactions and mimics human conversation, have emerged as a tool for providing patients with information about urologic conditions. We aimed to examine the quality of information related to benign prostatic hyperplasia surgery from four chatbots and how they would respond to sample patient messages.

Methods: We identified the top three queries in Google Trends related to "treatment for enlarged prostate." These were entered into ChatGPT (OpenAI), Bard (Google), Bing AI (Microsoft), and Doximity GPT (Doximity), both unprompted and prompted for specific criteria (optimized). The chatbot-provided answers to each query were evaluated for overall quality by three urologists using the DISCERN instrument. Readability was measured with the built-in Flesch-Kincaid reading level tool in Microsoft Word. To assess the ability of chatbots to answer patient questions, we prompted the chatbots with a clinical scenario related to holmium laser enucleation of the prostate, followed by 10 questions that the National Institutes of Health recommends patients ask before surgery. Accuracy and completeness of responses were graded with Likert scales.

Results: Without prompting, the quality of information was moderate across all chatbots but improved significantly with prompting (mean [SD], 3.3 [1.2] vs. 4.4 [0.7] out of 5; p < 0.001). When answering simulated patient messages, the chatbots were accurate (mean [SD], 5.6 [0.4] out of 6) and complete (mean [SD], 2.8 [0.3] out of 3). Additionally, 98% (39/40) had a median score of 5 or higher for accuracy, which corresponds to "nearly all correct." The readability was poor, with a mean (SD) Flesch-Kincaid reading level grade of 12.1 (1.3) (unprompted).

Conclusions: LLM chatbots hold promise for patient education, but their effectiveness is limited by the need for careful prompting from the user and by responding at a reading level higher than that of most Americans (grade 8). Educating patients and physicians on optimal LLM interaction is crucial to unlock the full potential of chatbots.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
与良性前列腺增生相关的聊天机器人信息质量。
背景:大语言模型(LLM)聊天机器人是一种人工智能(AI),擅长基于提示的互动并模仿人类对话,已成为向患者提供泌尿科疾病信息的一种工具。我们的目的是研究四个聊天机器人提供的良性前列腺增生手术相关信息的质量,以及它们如何回应患者的样本信息:我们确定了谷歌趋势中与 "治疗前列腺肥大 "相关的前三个查询。我们将这些查询输入 ChatGPT(OpenAI)、Bard(谷歌)、Bing AI(微软)和 Doximity GPT(Doximity),既有未提示的,也有提示特定标准的(优化的)。三位泌尿科专家使用 DISCERN 工具对聊天机器人提供的每个查询答案的整体质量进行了评估。可读性使用 Microsoft Word 内置的 Flesch-Kincaid 阅读水平工具进行测量。为了评估聊天机器人回答患者问题的能力,我们向聊天机器人提示了一个与前列腺钬激光去核术相关的临床场景,然后是美国国立卫生研究院建议患者在手术前提出的 10 个问题。回答的准确性和完整性采用李克特量表评分:结果:在没有提示的情况下,所有聊天机器人的信息质量都处于中等水平,但在有提示的情况下,信息质量有了显著提高(平均值 [SD], 3.3 [1.2] vs. 4.4 [0.7] (满分 5 分;P 结论:LLM 聊天机器人有望在前列腺摘除手术中发挥重要作用:LLM聊天机器人为患者教育带来了希望,但由于需要用户的仔细提示,而且用户的阅读水平高于大多数美国人(8年级),因此其有效性受到了限制。教育患者和医生如何进行最佳的 LLM 互动对于充分释放聊天机器人的潜力至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Prostate
Prostate 医学-泌尿学与肾脏学
CiteScore
5.10
自引率
3.60%
发文量
180
审稿时长
1.5 months
期刊介绍: The Prostate is a peer-reviewed journal dedicated to original studies of this organ and the male accessory glands. It serves as an international medium for these studies, presenting comprehensive coverage of clinical, anatomic, embryologic, physiologic, endocrinologic, and biochemical studies.
期刊最新文献
L1CAM mediates neuroendocrine phenotype acquisition in prostate cancer cells. Modern predictors and management of incidental prostate cancer at holmium enucleation of prostate. Effectiveness of androgen receptor pathway inhibitors and proton pump inhibitors. Reply to Letter to the Editor on "Impact of proton pump inhibitors on the efficacy of androgen receptor signaling inhibitors in metastatic castration-resistant prostate cancer patients". Bimodal imaging: Detection rate of clinically significant prostate cancer is higher in MRI lesions visible to transrectal ultrasound.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1