Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE)

IF 2.6 3区 医学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Journal of Surgical Education Pub Date : 2024-09-14 DOI:10.1016/j.jsurg.2024.08.002
{"title":"Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE)","authors":"","doi":"10.1016/j.jsurg.2024.08.002","DOIUrl":null,"url":null,"abstract":"<div><h3>OBJECTIVE</h3><p>Artificial intelligence (AI) is capable of answering complex medical examination questions, offering the potential to revolutionize medical education and healthcare delivery. In this study we aimed to assess ChatGPT, a model that has demonstrated exceptional performance on standardized exams. Specifically, our focus was on evaluating ChatGPT's performance on the complete 2019 Orthopaedic In-Training Examination (OITE), including questions with an image component. Furthermore, we explored difference in performance when questions varied by text only or text with an associated image, including whether the image was described using AI or a trained orthopaedist.</p></div><div><h3>DESIGN And SETTING</h3><p>Questions from the 2019 OITE were input into ChatGPT version 4.0 (GPT-4) using 3 response variants. As the capacity to input or interpret images is not publicly available in ChatGPT at the time of this study, questions with an image component were described and added to the OITE question using descriptions generated by Microsoft Azure AI Vision Studio or authors of the study.</p></div><div><h3>RESULTS</h3><p>ChatGPT performed equally on OITE questions with or without imaging components, with an average correct answer choice of 49% and 48% across all 3 input methods. Performance dropped by 6% when using image descriptions generated by AI. When using single answer multiple-choice input methods, ChatGPT performed nearly double the rate of random guessing, answering 49% of questions correctly. The performance of ChatGPT was worse than all resident classes on the 2019 exam, scoring 4% lower than PGY-1 residents.</p></div><div><h3>DISCUSSION</h3><p>ChatGT performed below all resident classes on the 2019 OITE. Performance on text only questions and questions with images was nearly equal if the image was described by a trained orthopaedic specialist but decreased when using an AI generated description. Recognizing the performance abilities of AI software may provide insight into the current and future applications of this technology into medical education.</p></div>","PeriodicalId":50033,"journal":{"name":"Journal of Surgical Education","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surgical Education","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1931720424003799","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

Abstract

OBJECTIVE

Artificial intelligence (AI) is capable of answering complex medical examination questions, offering the potential to revolutionize medical education and healthcare delivery. In this study we aimed to assess ChatGPT, a model that has demonstrated exceptional performance on standardized exams. Specifically, our focus was on evaluating ChatGPT's performance on the complete 2019 Orthopaedic In-Training Examination (OITE), including questions with an image component. Furthermore, we explored difference in performance when questions varied by text only or text with an associated image, including whether the image was described using AI or a trained orthopaedist.

DESIGN And SETTING

Questions from the 2019 OITE were input into ChatGPT version 4.0 (GPT-4) using 3 response variants. As the capacity to input or interpret images is not publicly available in ChatGPT at the time of this study, questions with an image component were described and added to the OITE question using descriptions generated by Microsoft Azure AI Vision Studio or authors of the study.

RESULTS

ChatGPT performed equally on OITE questions with or without imaging components, with an average correct answer choice of 49% and 48% across all 3 input methods. Performance dropped by 6% when using image descriptions generated by AI. When using single answer multiple-choice input methods, ChatGPT performed nearly double the rate of random guessing, answering 49% of questions correctly. The performance of ChatGPT was worse than all resident classes on the 2019 exam, scoring 4% lower than PGY-1 residents.

DISCUSSION

ChatGT performed below all resident classes on the 2019 OITE. Performance on text only questions and questions with images was nearly equal if the image was described by a trained orthopaedic specialist but decreased when using an AI generated description. Recognizing the performance abilities of AI software may provide insight into the current and future applications of this technology into medical education.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工智能在骨科中的应用:ChatGPT 在完整的 AAOS 骨科在训考试 (OITE) 中的文本和图像问题上的表现
目的人工智能(AI)能够回答复杂的医学考试问题,具有彻底改变医学教育和医疗服务的潜力。在本研究中,我们旨在评估 ChatGPT,这是一个在标准化考试中表现优异的模型。具体来说,我们的重点是评估 ChatGPT 在完整的 2019 年骨科在岗培训考试(OITE)中的表现,包括带有图像成分的问题。此外,我们还探索了当问题仅由文本或文本与相关图像(包括是否使用人工智能或训练有素的骨科医生对图像进行描述)不同时的性能差异。由于在本研究进行时,ChatGPT 还未公开提供输入或解释图像的功能,因此使用 Microsoft Azure AI Vision Studio 或本研究作者生成的描述,将带有图像组件的问题添加到 OITE 问题中。使用人工智能生成的图像描述时,成绩下降了 6%。在使用单项答案多选输入法时,ChatGPT 的表现几乎是随机猜测的两倍,正确回答了 49% 的问题。在 2019 年的考试中,ChatGPT 的表现比所有住院医师班级都要差,得分比 PGY-1 住院医师低 4%.讨论ChatGPT 在 2019 年 OITE 考试中的表现低于所有住院医师班级。如果图像是由受过培训的骨科专家描述的,那么仅有文字的问题和有图像的问题的成绩几乎相等,但如果使用人工智能生成的描述,成绩就会下降。认识到人工智能软件的表现能力,可以让我们深入了解这项技术在医学教育中的当前和未来应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Surgical Education
Journal of Surgical Education EDUCATION, SCIENTIFIC DISCIPLINES-SURGERY
CiteScore
5.60
自引率
10.30%
发文量
261
审稿时长
48 days
期刊介绍: The Journal of Surgical Education (JSE) is dedicated to advancing the field of surgical education through original research. The journal publishes research articles in all surgical disciplines on topics relative to the education of surgical students, residents, and fellows, as well as practicing surgeons. Our readers look to JSE for timely, innovative research findings from the international surgical education community. As the official journal of the Association of Program Directors in Surgery (APDS), JSE publishes the proceedings of the annual APDS meeting held during Surgery Education Week.
期刊最新文献
Resident-Applicant Buddy Program Increases Applicant Interest and Program Transparency Promoting Surgical Resident Well-being Through Therapist-Facilitated Discussion Groups: A Quantitative and Qualitative Analysis Obstetrics and Gynecology Residency Geographic Match Location Patterns: Comparison of Pre and Post Virtual Interviews General Surgery Residency Applicant Perspectives on Alternative Residency Interview Models with Implementation of an Optional Second Look Day Implementation and Evaluation of an Academic Development Rotation for Surgery Residents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1