Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions.

IF 1.8 Q2 OTORHINOLARYNGOLOGY OTO Open Pub Date : 2024-06-27 eCollection Date: 2024-04-01 DOI:10.1002/oto2.164
Evan A Patel, Lindsay Fleischer, Peter Filip, Michael Eggerstedt, Michael Hutz, Elias Michaelides, Pete S Batra, Bobby A Tajudeen
{"title":"Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions.","authors":"Evan A Patel, Lindsay Fleischer, Peter Filip, Michael Eggerstedt, Michael Hutz, Elias Michaelides, Pete S Batra, Bobby A Tajudeen","doi":"10.1002/oto2.164","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Advances in deep learning and artificial intelligence (AI) have led to the emergence of large language models (LLM) like ChatGPT from OpenAI. The study aimed to evaluate the performance of ChatGPT 3.5 and GPT4 on Otolaryngology (Rhinology) Standardized Board Examination questions in comparison to Otolaryngology residents.</p><p><strong>Methods: </strong>This study selected all 127 rhinology standardized questions from www.boardvitals.com, a commonly used study tool by otolaryngology residents preparing for board exams. Ninety-three text-based questions were administered to ChatGPT 3.5 and GPT4, and their answers were compared with the average results of the question bank (used primarily by otolaryngology residents). Thirty-four image-based questions were provided to GPT4 and underwent the same analysis. Based on the findings of an earlier study, a pass-fail cutoff was set at the 10th percentile.</p><p><strong>Results: </strong>On text-based questions, ChatGPT 3.5 answered correctly 45.2% of the time (8th percentile) (<i>P</i> = .0001), while GPT4 achieved 86.0% (66th percentile) (<i>P</i> = .001). GPT4 answered image-based questions correctly 64.7% of the time. Projections suggest that ChatGPT 3.5 might not pass the American Board of Otolaryngology Written Question Exam (ABOto WQE), whereas GPT4 stands a strong chance of passing.</p><p><strong>Discussion: </strong>The older LLM, ChatGPT 3.5, is unlikely to pass the ABOto WQE. However, the advanced GPT4 model exhibits a much higher likelihood of success. This rapid progression in AI indicates its potential future role in otolaryngology education.</p><p><strong>Implications for practice: </strong>As AI technology rapidly advances, it may be that AI-assisted medical education, diagnosis, and treatment planning become commonplace in the medical and surgical landscape.</p><p><strong>Level of evidence: </strong>Level 5.</p>","PeriodicalId":19697,"journal":{"name":"OTO Open","volume":"8 2","pages":"e164"},"PeriodicalIF":1.8000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11208739/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"OTO Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/oto2.164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Advances in deep learning and artificial intelligence (AI) have led to the emergence of large language models (LLM) like ChatGPT from OpenAI. The study aimed to evaluate the performance of ChatGPT 3.5 and GPT4 on Otolaryngology (Rhinology) Standardized Board Examination questions in comparison to Otolaryngology residents.

Methods: This study selected all 127 rhinology standardized questions from www.boardvitals.com, a commonly used study tool by otolaryngology residents preparing for board exams. Ninety-three text-based questions were administered to ChatGPT 3.5 and GPT4, and their answers were compared with the average results of the question bank (used primarily by otolaryngology residents). Thirty-four image-based questions were provided to GPT4 and underwent the same analysis. Based on the findings of an earlier study, a pass-fail cutoff was set at the 10th percentile.

Results: On text-based questions, ChatGPT 3.5 answered correctly 45.2% of the time (8th percentile) (P = .0001), while GPT4 achieved 86.0% (66th percentile) (P = .001). GPT4 answered image-based questions correctly 64.7% of the time. Projections suggest that ChatGPT 3.5 might not pass the American Board of Otolaryngology Written Question Exam (ABOto WQE), whereas GPT4 stands a strong chance of passing.

Discussion: The older LLM, ChatGPT 3.5, is unlikely to pass the ABOto WQE. However, the advanced GPT4 model exhibits a much higher likelihood of success. This rapid progression in AI indicates its potential future role in otolaryngology education.

Implications for practice: As AI technology rapidly advances, it may be that AI-assisted medical education, diagnosis, and treatment planning become commonplace in the medical and surgical landscape.

Level of evidence: Level 5.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT 3.5 和 GPT4 在鼻科标准化考试试题中的表现比较。
目的:深度学习和人工智能(AI)的进步导致了大型语言模型(LLM)的出现,如 OpenAI 的 ChatGPT。本研究旨在评估 ChatGPT 3.5 和 GPT4 在耳鼻喉科(鼻科)标准化考试问题上的表现,并与耳鼻喉科住院医师进行对比:本研究从 www.boardvitals.com 中选取了全部 127 道鼻科标准化试题,这是耳鼻喉科住院医师在准备住院医师考试时常用的学习工具。对 ChatGPT 3.5 和 GPT4 中的 93 道文字题进行了测试,并将其答案与题库(主要由耳鼻喉科住院医师使用)的平均结果进行了比较。GPT4 提供了 34 个基于图像的问题,并进行了同样的分析。根据之前的研究结果,通过与未通过的分界线设定为第 10 百分位数:对于基于文本的问题,ChatGPT 3.5 回答正确率为 45.2%(百分位数第 8 位)(P = .0001),而 GPT4 的正确率为 86.0%(百分位数第 66 位)(P = .001)。GPT4 回答图像类问题的正确率为 64.7%。预测表明,ChatGPT 3.5 可能无法通过美国耳鼻喉科医师执照笔试(ABOto WQE),而 GPT4 则很有可能通过:较早的 LLM ChatGPT 3.5 不太可能通过 ABOto WQE。然而,先进的 GPT4 模型成功的可能性要高得多。人工智能的飞速发展表明了其在耳鼻喉科教育中的潜在作用:随着人工智能技术的快速发展,人工智能辅助医学教育、诊断和治疗计划可能会成为医疗和外科领域的普遍现象:5级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
OTO Open
OTO Open Medicine-Surgery
CiteScore
2.70
自引率
0.00%
发文量
115
审稿时长
15 weeks
期刊最新文献
Delivery of Timely Adjuvant Radiation Among Veterans With Head and Neck Cancer. Differences in Negotiated Facility Fees for Otolaryngology Procedures at Ambulatory Surgery Centers and Hospitals. Olfactory Dysfunction in Primary Ciliary Dyskinesia. Concurrent Nasal Symptoms in Non-Rhinogenic Headache. Clinical Efficacy and Outcomes of Electro-Pneumatic Intracorporeal Lithotripsy in the Management of Sialolithiasis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1