Quality of Chatbot Information Related to Benign Prostatic Hyperplasia.

IF 2.6 3区医学 Q3 ENDOCRINOLOGY & METABOLISM Prostate Pub Date : 2025-02-01 Epub Date: 2024-11-08 DOI:10.1002/pros.24814

Christopher J Warren, Nicolette G Payne, Victoria S Edmonds, Sandeep S Voleti, Mouneeb M Choudry, Nahid Punjani, Haider M Abdul-Muhsin, Mitchell R Humphreys

{"title":"Quality of Chatbot Information Related to Benign Prostatic Hyperplasia.","authors":"Christopher J Warren, Nicolette G Payne, Victoria S Edmonds, Sandeep S Voleti, Mouneeb M Choudry, Nahid Punjani, Haider M Abdul-Muhsin, Mitchell R Humphreys","doi":"10.1002/pros.24814","DOIUrl":null,"url":null,"abstract":"Background: Large language model (LLM) chatbots, a form of artificial intelligence (AI) that excels at prompt-based interactions and mimics human conversation, have emerged as a tool for providing patients with information about urologic conditions. We aimed to examine the quality of information related to benign prostatic hyperplasia surgery from four chatbots and how they would respond to sample patient messages.Methods: We identified the top three queries in Google Trends related to \"treatment for enlarged prostate.\" These were entered into ChatGPT (OpenAI), Bard (Google), Bing AI (Microsoft), and Doximity GPT (Doximity), both unprompted and prompted for specific criteria (optimized). The chatbot-provided answers to each query were evaluated for overall quality by three urologists using the DISCERN instrument. Readability was measured with the built-in Flesch-Kincaid reading level tool in Microsoft Word. To assess the ability of chatbots to answer patient questions, we prompted the chatbots with a clinical scenario related to holmium laser enucleation of the prostate, followed by 10 questions that the National Institutes of Health recommends patients ask before surgery. Accuracy and completeness of responses were graded with Likert scales.Results: Without prompting, the quality of information was moderate across all chatbots but improved significantly with prompting (mean [SD], 3.3 [1.2] vs. 4.4 [0.7] out of 5; p < 0.001). When answering simulated patient messages, the chatbots were accurate (mean [SD], 5.6 [0.4] out of 6) and complete (mean [SD], 2.8 [0.3] out of 3). Additionally, 98% (39/40) had a median score of 5 or higher for accuracy, which corresponds to \"nearly all correct.\" The readability was poor, with a mean (SD) Flesch-Kincaid reading level grade of 12.1 (1.3) (unprompted).Conclusions: LLM chatbots hold promise for patient education, but their effectiveness is limited by the need for careful prompting from the user and by responding at a reading level higher than that of most Americans (grade 8). Educating patients and physicians on optimal LLM interaction is crucial to unlock the full potential of chatbots.","PeriodicalId":54544,"journal":{"name":"Prostate","volume":" ","pages":"175-180"},"PeriodicalIF":2.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prostate","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/pros.24814","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/8 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Large language model (LLM) chatbots, a form of artificial intelligence (AI) that excels at prompt-based interactions and mimics human conversation, have emerged as a tool for providing patients with information about urologic conditions. We aimed to examine the quality of information related to benign prostatic hyperplasia surgery from four chatbots and how they would respond to sample patient messages.

Methods: We identified the top three queries in Google Trends related to "treatment for enlarged prostate." These were entered into ChatGPT (OpenAI), Bard (Google), Bing AI (Microsoft), and Doximity GPT (Doximity), both unprompted and prompted for specific criteria (optimized). The chatbot-provided answers to each query were evaluated for overall quality by three urologists using the DISCERN instrument. Readability was measured with the built-in Flesch-Kincaid reading level tool in Microsoft Word. To assess the ability of chatbots to answer patient questions, we prompted the chatbots with a clinical scenario related to holmium laser enucleation of the prostate, followed by 10 questions that the National Institutes of Health recommends patients ask before surgery. Accuracy and completeness of responses were graded with Likert scales.

Results: Without prompting, the quality of information was moderate across all chatbots but improved significantly with prompting (mean [SD], 3.3 [1.2] vs. 4.4 [0.7] out of 5; p < 0.001). When answering simulated patient messages, the chatbots were accurate (mean [SD], 5.6 [0.4] out of 6) and complete (mean [SD], 2.8 [0.3] out of 3). Additionally, 98% (39/40) had a median score of 5 or higher for accuracy, which corresponds to "nearly all correct." The readability was poor, with a mean (SD) Flesch-Kincaid reading level grade of 12.1 (1.3) (unprompted).

Conclusions: LLM chatbots hold promise for patient education, but their effectiveness is limited by the need for careful prompting from the user and by responding at a reading level higher than that of most Americans (grade 8). Educating patients and physicians on optimal LLM interaction is crucial to unlock the full potential of chatbots.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

与良性前列腺增生相关的聊天机器人信息质量。

背景：大语言模型（LLM）聊天机器人是一种人工智能（AI），擅长基于提示的互动并模仿人类对话，已成为向患者提供泌尿科疾病信息的一种工具。我们的目的是研究四个聊天机器人提供的良性前列腺增生手术相关信息的质量，以及它们如何回应患者的样本信息：我们确定了谷歌趋势中与 "治疗前列腺肥大 "相关的前三个查询。我们将这些查询输入 ChatGPT（OpenAI）、Bard（谷歌）、Bing AI（微软）和 Doximity GPT（Doximity），既有未提示的，也有提示特定标准的（优化的）。三位泌尿科专家使用 DISCERN 工具对聊天机器人提供的每个查询答案的整体质量进行了评估。可读性使用 Microsoft Word 内置的 Flesch-Kincaid 阅读水平工具进行测量。为了评估聊天机器人回答患者问题的能力，我们向聊天机器人提示了一个与前列腺钬激光去核术相关的临床场景，然后是美国国立卫生研究院建议患者在手术前提出的 10 个问题。回答的准确性和完整性采用李克特量表评分：结果：在没有提示的情况下，所有聊天机器人的信息质量都处于中等水平，但在有提示的情况下，信息质量有了显著提高（平均值 [SD], 3.3 [1.2] vs. 4.4 [0.7] （满分 5 分；P 结论：LLM 聊天机器人有望在前列腺摘除手术中发挥重要作用：LLM聊天机器人为患者教育带来了希望，但由于需要用户的仔细提示，而且用户的阅读水平高于大多数美国人（8年级），因此其有效性受到了限制。教育患者和医生如何进行最佳的 LLM 互动对于充分释放聊天机器人的潜力至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

IF 2.7 2区医学Journal of endourologyPub Date : 2022-04-01 DOI: 10.1089/end.2021.0641

David Bouhadana, David-Dan Nguyen, Brendan Raizenne, Sai K Vangala, Iman Sadri, Bilal Chughtai, Dean S Elterman, Kevin C Zorn, Naeem Bhojani

Metabolic Syndrome and Benign Prostatic Hyperplasia/That Component of Metabolic Syndrome Is Related to Benign Prostatic Hyperplasia?

IF 0.3 Journal of Urological SurgeryPub Date : 2023-08-01 DOI: 10.4274/jus.galenos.2023.2022.0081

Bahar Arıcan Tarım, E. Çamur, Övünç Kavukoğlu, Mete Kösemen, Y. Özgür, K. Narter

Quality of life and symptoms related to benign prostatic hyperplasia among geriatric patients

IF 0 Assiut Scientific Nursing JournalPub Date : 2024-01-24 DOI: 10.21608/asnj.2024.255753.1736

H. Shaker, Diaa Abdel Hameed, H. Ibrahim, Nermeen Abd El- Aziz

来源期刊

Prostate 医学-泌尿学与肾脏学

CiteScore

5.10

自引率

3.60%

发文量

180

审稿时长

1.5 months

期刊介绍： The Prostate is a peer-reviewed journal dedicated to original studies of this organ and the male accessory glands. It serves as an international medium for these studies, presenting comprehensive coverage of clinical, anatomic, embryologic, physiologic, endocrinologic, and biochemical studies.