The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

IF 1.2 4区医学 Q3 ORTHOPEDICS Orthopedics Pub Date : 2024-03-01 Epub Date: 2023-09-27 DOI:10.3928/01477447-20230922-05

Hayden L Hofmann, Gage A Guerra, Jonathan L Le, Alexander M Wong, Grady H Hofmann, Cory K Mayfield, Frank A Petrigliano, Joseph N Liu

{"title":"The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.","authors":"Hayden L Hofmann, Gage A Guerra, Jonathan L Le, Alexander M Wong, Grady H Hofmann, Cory K Mayfield, Frank A Petrigliano, Joseph N Liu","doi":"10.3928/01477447-20230922-05","DOIUrl":null,"url":null,"abstract":"Advances in artificial intelligence and machine learning models, like Chat Generative Pre-trained Transformer (ChatGPT), have occurred at a remarkably fast rate. OpenAI released its newest model of ChatGPT, GPT-4, in March 2023. It offers a wide range of medical applications. The model has demonstrated notable proficiency on many medical board examinations. This study sought to assess GPT-4's performance on the Orthopaedic In-Training Examination (OITE) used to prepare residents for the American Board of Orthopaedic Surgery (ABOS) Part I Examination. The data gathered from GPT-4's performance were additionally compared with the data of the previous iteration of ChatGPT, GPT-3.5, which was released 4 months before GPT-4. GPT-4 correctly answered 251 of the 396 attempted questions (63.4%), whereas GPT-3.5 correctly answered 46.3% of 410 attempted questions. GPT-4 was significantly more accurate than GPT-3.5 on orthopedic board-style questions (P<.00001). GPT-4's performance is most comparable to that of an average third-year orthopedic surgery resident, while GPT-3.5 performed below an average orthopedic intern. GPT-4's overall accuracy was just below the approximate threshold that indicates a likely pass on the ABOS Part I Examination. Our results demonstrate significant improvements in OpenAI's newest model, GPT-4. Future studies should assess potential clinical applications as AI models continue to be trained on larger data sets and offer more capabilities. [Orthopedics. 2024;47(2):e85-e89.].","PeriodicalId":19631,"journal":{"name":"Orthopedics","volume":" ","pages":"e85-e89"},"PeriodicalIF":1.2000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orthopedics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01477447-20230922-05","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/27 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Advances in artificial intelligence and machine learning models, like Chat Generative Pre-trained Transformer (ChatGPT), have occurred at a remarkably fast rate. OpenAI released its newest model of ChatGPT, GPT-4, in March 2023. It offers a wide range of medical applications. The model has demonstrated notable proficiency on many medical board examinations. This study sought to assess GPT-4's performance on the Orthopaedic In-Training Examination (OITE) used to prepare residents for the American Board of Orthopaedic Surgery (ABOS) Part I Examination. The data gathered from GPT-4's performance were additionally compared with the data of the previous iteration of ChatGPT, GPT-3.5, which was released 4 months before GPT-4. GPT-4 correctly answered 251 of the 396 attempted questions (63.4%), whereas GPT-3.5 correctly answered 46.3% of 410 attempted questions. GPT-4 was significantly more accurate than GPT-3.5 on orthopedic board-style questions (P<.00001). GPT-4's performance is most comparable to that of an average third-year orthopedic surgery resident, while GPT-3.5 performed below an average orthopedic intern. GPT-4's overall accuracy was just below the approximate threshold that indicates a likely pass on the ABOS Part I Examination. Our results demonstrate significant improvements in OpenAI's newest model, GPT-4. Future studies should assess potential clinical applications as AI models continue to be trained on larger data sets and offer more capabilities. [Orthopedics. 2024;47(2):e85-e89.].

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人工智能的快速发展：GPT-4在骨科手术委员会问题上的表现。

人工智能和机器学习模型的进步，如聊天生成预训练转换器（ChatGPT），以惊人的速度出现。OpenAI于2023年3月发布了其最新的ChatGPT模型GPT-4。它提供了广泛的医疗应用。该模型在许多医学委员会考试中表现出了显著的熟练程度。本研究旨在评估GPT-4在骨科培训考试（OITE）中的表现，该考试用于为住院医师参加美国骨科手术委员会（ABOS）第一部分考试做准备。从GPT-4的性能中收集的数据还与之前的ChatGPT迭代GPT-3.5的数据进行了比较，后者在GPT-4之前4个月发布。GPT-4正确回答了396个尝试问题中的251个（63.4%），而GPT-3.5正确回答了410个尝试问题的46.3%。GPT-4在骨科委员会式问题上明显比GPT-3.5更准确（POrthopedics.202x；4x（x）:xx-xx.]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Orthopedics 医学-整形外科

CiteScore

2.20

自引率

0.00%

发文量

160

审稿时长

3 months

期刊介绍： For over 40 years, Orthopedics, a bimonthly peer-reviewed journal, has been the preferred choice of orthopedic surgeons for clinically relevant information on all aspects of adult and pediatric orthopedic surgery and treatment. Edited by Robert D''Ambrosia, MD, Chairman of the Department of Orthopedics at the University of Colorado, Denver, and former President of the American Academy of Orthopaedic Surgeons, as well as an Editorial Board of over 100 international orthopedists, Orthopedics is the source to turn to for guidance in your practice. The journal offers access to current articles, as well as several years of archived content. Highlights also include Blue Ribbon articles published full text in print and online, as well as Tips & Techniques posted with every issue.