Utilization of ChatGPT for Rhinology Patient Education: Limitations in a Surgical Sub-Specialty.

IF 1.8 Q2 OTORHINOLARYNGOLOGY OTO Open Pub Date : 2025-01-07 eCollection Date: 2025-01-01 DOI:10.1002/oto2.70065
Alice E Huang, Michael T Chang, Ashoke Khanwalkar, Carol H Yan, Katie M Phillips, Michael J Yong, Jayakar V Nayak, Peter H Hwang, Zara M Patel
{"title":"Utilization of ChatGPT for Rhinology Patient Education: Limitations in a Surgical Sub-Specialty.","authors":"Alice E Huang, Michael T Chang, Ashoke Khanwalkar, Carol H Yan, Katie M Phillips, Michael J Yong, Jayakar V Nayak, Peter H Hwang, Zara M Patel","doi":"10.1002/oto2.70065","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To analyze the accuracy of ChatGPT-generated responses to common rhinologic patient questions.</p><p><strong>Methods: </strong>Ten common questions from rhinology patients were compiled by a panel of 4 rhinology fellowship-trained surgeons based on clinical patient experience. This panel (Panel 1) developed consensus \"expert\" responses to each question. Questions were individually posed to ChatGPT (version 3.5) and its responses recorded. ChatGPT-generated responses were individually graded by Panel 1 on a scale of 0 (incorrect) to 3 (correct and exceeding the quality of expert responses). A 2nd panel was given the consensus and ChatGPT responses to each question and asked to guess which response corresponded to which source. They then graded ChatGPT responses using the same criteria as Panel 1. Question-specific and overall mean grades for ChatGPT responses, as well as interclass correlation coefficient (ICC) as a measure of interrater reliability, were calculated.</p><p><strong>Results: </strong>The overall mean grade for ChatGPT responses was 1.65/3. For 2 out of 10 questions, ChatGPT responses were equal to or better than expert responses. However, for 8 out of 10 questions, ChatGPT provided responses that were incorrect, false, or incomplete based on mean rater grades. Overall ICC was 0.526, indicating moderate reliability among raters of ChatGPT responses. Reviewers were able to discern ChatGPT from human responses with 97.5% accuracy.</p><p><strong>Conclusion: </strong>This preliminary study demonstrates overall near-complete and variably accurate responses provided by ChatGPT to common rhinologic questions, demonstrating important limitations in nuanced subspecialty fields.</p>","PeriodicalId":19697,"journal":{"name":"OTO Open","volume":"9 1","pages":"e70065"},"PeriodicalIF":1.8000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11705442/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"OTO Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/oto2.70065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To analyze the accuracy of ChatGPT-generated responses to common rhinologic patient questions.

Methods: Ten common questions from rhinology patients were compiled by a panel of 4 rhinology fellowship-trained surgeons based on clinical patient experience. This panel (Panel 1) developed consensus "expert" responses to each question. Questions were individually posed to ChatGPT (version 3.5) and its responses recorded. ChatGPT-generated responses were individually graded by Panel 1 on a scale of 0 (incorrect) to 3 (correct and exceeding the quality of expert responses). A 2nd panel was given the consensus and ChatGPT responses to each question and asked to guess which response corresponded to which source. They then graded ChatGPT responses using the same criteria as Panel 1. Question-specific and overall mean grades for ChatGPT responses, as well as interclass correlation coefficient (ICC) as a measure of interrater reliability, were calculated.

Results: The overall mean grade for ChatGPT responses was 1.65/3. For 2 out of 10 questions, ChatGPT responses were equal to or better than expert responses. However, for 8 out of 10 questions, ChatGPT provided responses that were incorrect, false, or incomplete based on mean rater grades. Overall ICC was 0.526, indicating moderate reliability among raters of ChatGPT responses. Reviewers were able to discern ChatGPT from human responses with 97.5% accuracy.

Conclusion: This preliminary study demonstrates overall near-complete and variably accurate responses provided by ChatGPT to common rhinologic questions, demonstrating important limitations in nuanced subspecialty fields.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT在鼻科患者教育中的应用:外科亚专科的局限性。
目的:分析chatgpt对常见鼻科患者问题的回答的准确性。方法:由4名接受过奖学金培训的鼻外科医生组成的小组根据临床患者经验汇编了来自鼻科患者的10个常见问题。这个小组(小组1)对每个问题提出了一致的“专家”回答。向ChatGPT(版本3.5)单独提出问题,并记录其回答。chatgpt生成的回答由小组1在0(不正确)到3(正确且超过专家回答的质量)的范围内单独评分。第二个小组给出了对每个问题的共识和ChatGPT回答,并要求猜测哪个回答对应于哪个来源。然后,他们使用与小组1相同的标准对ChatGPT的回答进行评分。计算了ChatGPT回答的特定问题和总体平均等级,以及作为衡量可信度的类间相关系数(ICC)。结果:ChatGPT反应的总体平均评分为1.65/3。10个问题中有2个,ChatGPT的回答等于或优于专家的回答。然而,在10个问题中有8个,ChatGPT提供的答案是不正确的、错误的或不完整的,基于平均评分。总体ICC为0.526,表明ChatGPT评分者的可靠性中等。审稿人能够以97.5%的准确率从人类反应中识别出ChatGPT。结论:这项初步研究表明,ChatGPT对常见的鼻科学问题提供了总体上接近完整和可变准确的反应,显示了细微差别的亚专业领域的重要局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
OTO Open
OTO Open Medicine-Surgery
CiteScore
2.70
自引率
0.00%
发文量
115
审稿时长
15 weeks
期刊最新文献
Concurrent Nasal Symptoms in Non-Rhinogenic Headache. Clinical Efficacy and Outcomes of Electro-Pneumatic Intracorporeal Lithotripsy in the Management of Sialolithiasis. Coblation Versus Radiofrequency for Tongue Base Reduction in Obstructive Sleep Apnea: A Meta-analysis. Parathyroid Hormone Fluctuations During Thyroid and Parathyroid Surgery. Enhancing AI Chatbot Responses in Health Care: The SMART Prompt Structure in Head and Neck Surgery.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1