Evaluation of ChatGPT as a Reliable Source of Medical Information on Prostate Cancer for Patients: Global Comparative Survey of Medical Oncologists and Urologists.

IF 0.8 Q4 UROLOGY & NEPHROLOGY Urology Practice Pub Date : 2024-11-07 DOI:10.1097/UPJ.0000000000000740
Arnulf Stenzl, Andrew J Armstrong, Eamonn Rogers, Dany Habr, Jochen Walz, Martin Gleave, Andrea Sboner, Jennifer Ghith, Lucile Serfass, Kristine W Schuler, Sam Garas, Dheepa Chari, Ken Truman, Cora N Sternberg
{"title":"Evaluation of ChatGPT as a Reliable Source of Medical Information on Prostate Cancer for Patients: Global Comparative Survey of Medical Oncologists and Urologists.","authors":"Arnulf Stenzl, Andrew J Armstrong, Eamonn Rogers, Dany Habr, Jochen Walz, Martin Gleave, Andrea Sboner, Jennifer Ghith, Lucile Serfass, Kristine W Schuler, Sam Garas, Dheepa Chari, Ken Truman, Cora N Sternberg","doi":"10.1097/UPJ.0000000000000740","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>No consensus exists on performance standards for evaluation of generative artificial intelligence (AI) to generate medical responses. The purpose of this study was the assessment of Chat Generative Pre-trained Transformer (ChatGPT) to address medical questions in prostate cancer.</p><p><strong>Methods: </strong>A global online survey was conducted from April to June 2023 among > 700 medical oncologists or urologists who treat patients with prostate cancer. Participants were unaware this was a survey evaluating AI. In component 1, responses to 9 questions were written independently by medical writers (MWs; from medical websites) and ChatGPT-4.0 (AI generated from publicly available information). Respondents were randomly exposed and blinded to both AI-generated and MW-curated responses; evaluation criteria and overall preference were recorded. Exploratory component 2 evaluated AI-generated responses to 5 complex questions with nuanced answers in the medical literature. Responses were evaluated on a 5-point Likert scale. Statistical significance was denoted by <i>P</i> < .05.</p><p><strong>Results: </strong>In component 1, respondents (N = 602) consistently preferred the clarity of AI-generated responses over MW-curated responses in 7 of 9 questions (<i>P</i> < .05). Despite favoring AI-generated responses when blinded to questions/answers, respondents considered medical websites a more credible source (52%-67%) than ChatGPT (14%). Respondents in component 2 (N = 98) also considered medical websites more credible than ChatGPT, but rated AI-generated responses highly for all evaluation criteria, despite nuanced answers in the medical literature.</p><p><strong>Conclusions: </strong>These findings provide insight into how clinicians rate AI-generated and MW-curated responses with evaluation criteria that can be used in future AI validation studies.</p>","PeriodicalId":45220,"journal":{"name":"Urology Practice","volume":" ","pages":"101097UPJ0000000000000740"},"PeriodicalIF":0.8000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Urology Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1097/UPJ.0000000000000740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: No consensus exists on performance standards for evaluation of generative artificial intelligence (AI) to generate medical responses. The purpose of this study was the assessment of Chat Generative Pre-trained Transformer (ChatGPT) to address medical questions in prostate cancer.

Methods: A global online survey was conducted from April to June 2023 among > 700 medical oncologists or urologists who treat patients with prostate cancer. Participants were unaware this was a survey evaluating AI. In component 1, responses to 9 questions were written independently by medical writers (MWs; from medical websites) and ChatGPT-4.0 (AI generated from publicly available information). Respondents were randomly exposed and blinded to both AI-generated and MW-curated responses; evaluation criteria and overall preference were recorded. Exploratory component 2 evaluated AI-generated responses to 5 complex questions with nuanced answers in the medical literature. Responses were evaluated on a 5-point Likert scale. Statistical significance was denoted by P < .05.

Results: In component 1, respondents (N = 602) consistently preferred the clarity of AI-generated responses over MW-curated responses in 7 of 9 questions (P < .05). Despite favoring AI-generated responses when blinded to questions/answers, respondents considered medical websites a more credible source (52%-67%) than ChatGPT (14%). Respondents in component 2 (N = 98) also considered medical websites more credible than ChatGPT, but rated AI-generated responses highly for all evaluation criteria, despite nuanced answers in the medical literature.

Conclusions: These findings provide insight into how clinicians rate AI-generated and MW-curated responses with evaluation criteria that can be used in future AI validation studies.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对 ChatGPT 作为前列腺癌患者可靠医疗信息来源的评估:肿瘤内科医生和泌尿科医生的全球比较调查。
目的:在评估生成式人工智能(AI)生成医疗回复的性能标准方面尚未达成共识。本研究的目的是评估聊天生成预训练转换器(ChatGPT)在解决前列腺癌医疗问题方面的性能:2023 年 4 月至 6 月,对超过 700 名治疗前列腺癌患者的肿瘤内科医生或泌尿科医生进行了全球在线调查。参与者并不知道这是一项评估人工智能的调查。在第 1 部分中,9 个问题的回答由医学撰稿人(MW;来自医学网站)和 ChatGPT-4.0(根据公开信息生成的人工智能)独立撰写。受访者随机接触人工智能生成的回答和医学撰稿人撰写的回答,并对两者进行盲测;记录评价标准和总体偏好。探索部分 2 对人工智能生成的回答进行了评估,这些回答涉及医学文献中五个复杂问题的细微差别。回答采用 5 点李克特量表进行评估。统计意义以 P < .05 表示:在第 1 部分中,有 7/9 个问题的受访者(N = 602)始终更喜欢人工智能生成的回答,而不是医学文献整理的回答(P < .05)。尽管在对问题/答案进行盲测时,受访者更倾向于人工智能生成的回答,但他们认为医疗网站(52%-67%)比 ChatGPT(14%)是更可信的来源。第 2 部分的受访者(N = 98)也认为医学网站比 ChatGPT 更可信,但尽管医学文献中的答案有细微差别,受访者对人工智能生成的回复在所有评价标准中的评分都很高:这些发现为临床医生如何根据评价标准对人工智能生成的回复和 MW 整理的回复进行评分提供了见解,可用于未来的人工智能验证研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Urology Practice
Urology Practice UROLOGY & NEPHROLOGY-
CiteScore
1.80
自引率
12.50%
发文量
163
期刊最新文献
Disproportionality Analysis of Hypotension-Related Adverse Drug Reactions Associated with Type 1a Selective Alpha Blockers in VigiBase. Renal Function After Partial Nephrectomy: A Comparative Analysis of Warm, Cold, and No Ischemia Methods. Risks and Benefits of Caprini Score Recommended Thromboprophylaxis After Radical Prostatectomy and Nephrectomy. Trends in Outpatient Radical Prostatectomy and Same-Day Discharge for Prostate Cancer: Analysis of the National Inpatient Sample and Nationwide Ambulatory Surgery Sample. Evidence-Based Framework for Surgical Irrigation Fluid Stewardship and Endoscopic Case Prioritization During Fluid Shortages.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1