The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

IF 1.1 4区 医学 Q3 ORTHOPEDICS Orthopedics Pub Date : 2024-03-01 Epub Date: 2023-09-27 DOI:10.3928/01477447-20230922-05
Hayden L Hofmann, Gage A Guerra, Jonathan L Le, Alexander M Wong, Grady H Hofmann, Cory K Mayfield, Frank A Petrigliano, Joseph N Liu
{"title":"The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.","authors":"Hayden L Hofmann, Gage A Guerra, Jonathan L Le, Alexander M Wong, Grady H Hofmann, Cory K Mayfield, Frank A Petrigliano, Joseph N Liu","doi":"10.3928/01477447-20230922-05","DOIUrl":null,"url":null,"abstract":"<p><p>Advances in artificial intelligence and machine learning models, like Chat Generative Pre-trained Transformer (ChatGPT), have occurred at a remarkably fast rate. OpenAI released its newest model of ChatGPT, GPT-4, in March 2023. It offers a wide range of medical applications. The model has demonstrated notable proficiency on many medical board examinations. This study sought to assess GPT-4's performance on the Orthopaedic In-Training Examination (OITE) used to prepare residents for the American Board of Orthopaedic Surgery (ABOS) Part I Examination. The data gathered from GPT-4's performance were additionally compared with the data of the previous iteration of ChatGPT, GPT-3.5, which was released 4 months before GPT-4. GPT-4 correctly answered 251 of the 396 attempted questions (63.4%), whereas GPT-3.5 correctly answered 46.3% of 410 attempted questions. GPT-4 was significantly more accurate than GPT-3.5 on orthopedic board-style questions (<i>P</i><.00001). GPT-4's performance is most comparable to that of an average third-year orthopedic surgery resident, while GPT-3.5 performed below an average orthopedic intern. GPT-4's overall accuracy was just below the approximate threshold that indicates a likely pass on the ABOS Part I Examination. Our results demonstrate significant improvements in OpenAI's newest model, GPT-4. Future studies should assess potential clinical applications as AI models continue to be trained on larger data sets and offer more capabilities. [<i>Orthopedics</i>. 2024;47(2):e85-e89.].</p>","PeriodicalId":19631,"journal":{"name":"Orthopedics","volume":" ","pages":"e85-e89"},"PeriodicalIF":1.1000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orthopedics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01477447-20230922-05","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/27 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Advances in artificial intelligence and machine learning models, like Chat Generative Pre-trained Transformer (ChatGPT), have occurred at a remarkably fast rate. OpenAI released its newest model of ChatGPT, GPT-4, in March 2023. It offers a wide range of medical applications. The model has demonstrated notable proficiency on many medical board examinations. This study sought to assess GPT-4's performance on the Orthopaedic In-Training Examination (OITE) used to prepare residents for the American Board of Orthopaedic Surgery (ABOS) Part I Examination. The data gathered from GPT-4's performance were additionally compared with the data of the previous iteration of ChatGPT, GPT-3.5, which was released 4 months before GPT-4. GPT-4 correctly answered 251 of the 396 attempted questions (63.4%), whereas GPT-3.5 correctly answered 46.3% of 410 attempted questions. GPT-4 was significantly more accurate than GPT-3.5 on orthopedic board-style questions (P<.00001). GPT-4's performance is most comparable to that of an average third-year orthopedic surgery resident, while GPT-3.5 performed below an average orthopedic intern. GPT-4's overall accuracy was just below the approximate threshold that indicates a likely pass on the ABOS Part I Examination. Our results demonstrate significant improvements in OpenAI's newest model, GPT-4. Future studies should assess potential clinical applications as AI models continue to be trained on larger data sets and offer more capabilities. [Orthopedics. 2024;47(2):e85-e89.].

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工智能的快速发展:GPT-4在骨科手术委员会问题上的表现。
人工智能和机器学习模型的进步,如聊天生成预训练转换器(ChatGPT),以惊人的速度出现。OpenAI于2023年3月发布了其最新的ChatGPT模型GPT-4。它提供了广泛的医疗应用。该模型在许多医学委员会考试中表现出了显著的熟练程度。本研究旨在评估GPT-4在骨科培训考试(OITE)中的表现,该考试用于为住院医师参加美国骨科手术委员会(ABOS)第一部分考试做准备。从GPT-4的性能中收集的数据还与之前的ChatGPT迭代GPT-3.5的数据进行了比较,后者在GPT-4之前4个月发布。GPT-4正确回答了396个尝试问题中的251个(63.4%),而GPT-3.5正确回答了410个尝试问题的46.3%。GPT-4在骨科委员会式问题上明显比GPT-3.5更准确(POrthopedics.202x;4x(x):xx-xx.]。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Orthopedics
Orthopedics 医学-整形外科
CiteScore
2.20
自引率
0.00%
发文量
160
审稿时长
3 months
期刊介绍: For over 40 years, Orthopedics, a bimonthly peer-reviewed journal, has been the preferred choice of orthopedic surgeons for clinically relevant information on all aspects of adult and pediatric orthopedic surgery and treatment. Edited by Robert D''Ambrosia, MD, Chairman of the Department of Orthopedics at the University of Colorado, Denver, and former President of the American Academy of Orthopaedic Surgeons, as well as an Editorial Board of over 100 international orthopedists, Orthopedics is the source to turn to for guidance in your practice. The journal offers access to current articles, as well as several years of archived content. Highlights also include Blue Ribbon articles published full text in print and online, as well as Tips & Techniques posted with every issue.
期刊最新文献
Anti-Leukotriene Receptor Blockers Improve Tendon-Bone Interface Healing in a Rat Model of Acute Rotator Cuff Tear. Rates of Revision for Progressive Deformity and Contralateral Slipped Capital Femoral Epiphysis Using a Partially Threaded Cannulated Screw: A Retrospective Review. Tapered, Fluted, Titanium Stems in Revision Total Hip Arthroplasty. Predictors of Delayed Surgery After Distal Radius Fracture: A Large National Database Study. The Fragility of Statistical Findings Regarding Hemiarthroplasty Versus Total Hip Arthroplasty for Displaced Femoral Neck Fractures.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1