Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.

IF 1.1 4区 医学 Q3 ORTHOPEDICS Orthopedics Pub Date : 2024-05-01 Epub Date: 2024-03-12 DOI:10.3928/01477447-20240304-02
Marc Lubitz, Luke Latario
{"title":"Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.","authors":"Marc Lubitz, Luke Latario","doi":"10.3928/01477447-20240304-02","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) generative large language models are powerful and increasingly accessible tools with potential applications in health care education and training. The annual Orthopaedic In-Training Examination (OITE) is widely used to assess resident academic progress and preparation for the American Board of Orthopaedic Surgery Part 1 Examination.</p><p><strong>Materials and methods: </strong>Open AI's ChatGPT and Google's Bard generative language models were administered the 2022 OITE. Question stems that contained images were input without and then with a text-based description of the imaging findings.</p><p><strong>Results: </strong>ChatGPT answered 69.1% of questions correctly. When provided with text describing accompanying media, this increased to 77.8% correct. In contrast, Bard answered 49.8% of questions correctly. This increased to 58% correct when text describing imaging in question stems was provided (<i>P</i><.0001). ChatGPT was most accurate in questions within the shoulder category, with 90.9% correct. Bard performed best in the sports category, with 65.4% correct. ChatGPT performed above the published mean of Accreditation Council for Graduate Medical Education orthopedic resident test-takers (66%).</p><p><strong>Conclusion: </strong>There is significant variability in the accuracy of publicly available AI models on the OITE. AI generative language software may play numerous potential roles in the future in orthopedic education, including simulating patient presentations and clinical scenarios, customizing individual learning plans, and driving evidence-based case discussion. Further research and collaboration within the orthopedic community is required to safely adopt these tools and minimize risks associated with their use. [<i>Orthopedics</i>. 2024;47(3):e146-e150.].</p>","PeriodicalId":19631,"journal":{"name":"Orthopedics","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orthopedics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01477447-20240304-02","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/12 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Artificial intelligence (AI) generative large language models are powerful and increasingly accessible tools with potential applications in health care education and training. The annual Orthopaedic In-Training Examination (OITE) is widely used to assess resident academic progress and preparation for the American Board of Orthopaedic Surgery Part 1 Examination.

Materials and methods: Open AI's ChatGPT and Google's Bard generative language models were administered the 2022 OITE. Question stems that contained images were input without and then with a text-based description of the imaging findings.

Results: ChatGPT answered 69.1% of questions correctly. When provided with text describing accompanying media, this increased to 77.8% correct. In contrast, Bard answered 49.8% of questions correctly. This increased to 58% correct when text describing imaging in question stems was provided (P<.0001). ChatGPT was most accurate in questions within the shoulder category, with 90.9% correct. Bard performed best in the sports category, with 65.4% correct. ChatGPT performed above the published mean of Accreditation Council for Graduate Medical Education orthopedic resident test-takers (66%).

Conclusion: There is significant variability in the accuracy of publicly available AI models on the OITE. AI generative language software may play numerous potential roles in the future in orthopedic education, including simulating patient presentations and clinical scenarios, customizing individual learning plans, and driving evidence-based case discussion. Further research and collaboration within the orthopedic community is required to safely adopt these tools and minimize risks associated with their use. [Orthopedics. 2024;47(3):e146-e150.].

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
两种人工智能生成语言模型在骨科内训考试中的表现。
背景:人工智能(AI)生成大语言模型是一种功能强大且日益普及的工具,在医疗保健教育和培训中具有潜在的应用价值。一年一度的矫形外科在岗培训考试(OITE)被广泛用于评估住院医师的学习进度和美国矫形外科委员会第一部分考试的准备情况:对 Open AI 的 ChatGPT 和谷歌的 Bard 生成语言模型进行了 2022 年 OITE 测试。在输入包含图像的题干时,先不输入图像,然后再输入基于文本的成像结果描述:结果:ChatGPT 正确回答了 69.1% 的问题。当提供随附媒体的文字描述时,正确率提高到 77.8%。相比之下,Bard 回答的问题正确率为 49.8%。当在问题题干中提供描述成像的文字时,正确率上升到 58%(PC 结论:在 OITE 上公开提供的人工智能模型的准确性存在很大差异。人工智能生成语言软件未来可能会在骨科教育中发挥许多潜在作用,包括模拟患者陈述和临床场景、定制个人学习计划以及推动循证病例讨论。要安全地采用这些工具并最大限度地降低与使用这些工具相关的风险,还需要在骨科界开展进一步的研究与合作。[骨科。202x;4x(x):xx-xx]。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Orthopedics
Orthopedics 医学-整形外科
CiteScore
2.20
自引率
0.00%
发文量
160
审稿时长
3 months
期刊介绍: For over 40 years, Orthopedics, a bimonthly peer-reviewed journal, has been the preferred choice of orthopedic surgeons for clinically relevant information on all aspects of adult and pediatric orthopedic surgery and treatment. Edited by Robert D''Ambrosia, MD, Chairman of the Department of Orthopedics at the University of Colorado, Denver, and former President of the American Academy of Orthopaedic Surgeons, as well as an Editorial Board of over 100 international orthopedists, Orthopedics is the source to turn to for guidance in your practice. The journal offers access to current articles, as well as several years of archived content. Highlights also include Blue Ribbon articles published full text in print and online, as well as Tips & Techniques posted with every issue.
期刊最新文献
Disparities Exist in Knowledge of Hip Fracture Compared With Stroke and Myocardial Infarction. Impact of Obesity, Smoking, and Age on 30-Day Postoperative Outcomes of Patients Undergoing Arthroscopic Meniscus Surgery. Incidence of Parental Requests to Discontinue Growth-Friendly Surgical Lengthening for Early Onset Scoliosis. Increased 90-Day Morbidity and Mortality Among Patients With Hip Fracture During the COVID-19 Pandemic. Primary Open Latarjet Procedure Versus Revision to Open Latarjet Procedure for Anterior Shoulder Instability.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1