{"title":"两种人工智能生成语言模型在骨科内训考试中的表现。","authors":"Marc Lubitz, Luke Latario","doi":"10.3928/01477447-20240304-02","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) generative large language models are powerful and increasingly accessible tools with potential applications in health care education and training. The annual Orthopaedic In-Training Examination (OITE) is widely used to assess resident academic progress and preparation for the American Board of Orthopaedic Surgery Part 1 Examination.</p><p><strong>Materials and methods: </strong>Open AI's ChatGPT and Google's Bard generative language models were administered the 2022 OITE. Question stems that contained images were input without and then with a text-based description of the imaging findings.</p><p><strong>Results: </strong>ChatGPT answered 69.1% of questions correctly. When provided with text describing accompanying media, this increased to 77.8% correct. In contrast, Bard answered 49.8% of questions correctly. This increased to 58% correct when text describing imaging in question stems was provided (<i>P</i><.0001). ChatGPT was most accurate in questions within the shoulder category, with 90.9% correct. Bard performed best in the sports category, with 65.4% correct. ChatGPT performed above the published mean of Accreditation Council for Graduate Medical Education orthopedic resident test-takers (66%).</p><p><strong>Conclusion: </strong>There is significant variability in the accuracy of publicly available AI models on the OITE. AI generative language software may play numerous potential roles in the future in orthopedic education, including simulating patient presentations and clinical scenarios, customizing individual learning plans, and driving evidence-based case discussion. Further research and collaboration within the orthopedic community is required to safely adopt these tools and minimize risks associated with their use. [<i>Orthopedics</i>. 2024;47(3):e146-e150.].</p>","PeriodicalId":19631,"journal":{"name":"Orthopedics","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.\",\"authors\":\"Marc Lubitz, Luke Latario\",\"doi\":\"10.3928/01477447-20240304-02\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Artificial intelligence (AI) generative large language models are powerful and increasingly accessible tools with potential applications in health care education and training. The annual Orthopaedic In-Training Examination (OITE) is widely used to assess resident academic progress and preparation for the American Board of Orthopaedic Surgery Part 1 Examination.</p><p><strong>Materials and methods: </strong>Open AI's ChatGPT and Google's Bard generative language models were administered the 2022 OITE. Question stems that contained images were input without and then with a text-based description of the imaging findings.</p><p><strong>Results: </strong>ChatGPT answered 69.1% of questions correctly. When provided with text describing accompanying media, this increased to 77.8% correct. In contrast, Bard answered 49.8% of questions correctly. This increased to 58% correct when text describing imaging in question stems was provided (<i>P</i><.0001). ChatGPT was most accurate in questions within the shoulder category, with 90.9% correct. Bard performed best in the sports category, with 65.4% correct. ChatGPT performed above the published mean of Accreditation Council for Graduate Medical Education orthopedic resident test-takers (66%).</p><p><strong>Conclusion: </strong>There is significant variability in the accuracy of publicly available AI models on the OITE. AI generative language software may play numerous potential roles in the future in orthopedic education, including simulating patient presentations and clinical scenarios, customizing individual learning plans, and driving evidence-based case discussion. Further research and collaboration within the orthopedic community is required to safely adopt these tools and minimize risks associated with their use. [<i>Orthopedics</i>. 2024;47(3):e146-e150.].</p>\",\"PeriodicalId\":19631,\"journal\":{\"name\":\"Orthopedics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Orthopedics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3928/01477447-20240304-02\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/3/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orthopedics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01477447-20240304-02","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/12 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.
Background: Artificial intelligence (AI) generative large language models are powerful and increasingly accessible tools with potential applications in health care education and training. The annual Orthopaedic In-Training Examination (OITE) is widely used to assess resident academic progress and preparation for the American Board of Orthopaedic Surgery Part 1 Examination.
Materials and methods: Open AI's ChatGPT and Google's Bard generative language models were administered the 2022 OITE. Question stems that contained images were input without and then with a text-based description of the imaging findings.
Results: ChatGPT answered 69.1% of questions correctly. When provided with text describing accompanying media, this increased to 77.8% correct. In contrast, Bard answered 49.8% of questions correctly. This increased to 58% correct when text describing imaging in question stems was provided (P<.0001). ChatGPT was most accurate in questions within the shoulder category, with 90.9% correct. Bard performed best in the sports category, with 65.4% correct. ChatGPT performed above the published mean of Accreditation Council for Graduate Medical Education orthopedic resident test-takers (66%).
Conclusion: There is significant variability in the accuracy of publicly available AI models on the OITE. AI generative language software may play numerous potential roles in the future in orthopedic education, including simulating patient presentations and clinical scenarios, customizing individual learning plans, and driving evidence-based case discussion. Further research and collaboration within the orthopedic community is required to safely adopt these tools and minimize risks associated with their use. [Orthopedics. 2024;47(3):e146-e150.].
期刊介绍:
For over 40 years, Orthopedics, a bimonthly peer-reviewed journal, has been the preferred choice of orthopedic surgeons for clinically relevant information on all aspects of adult and pediatric orthopedic surgery and treatment. Edited by Robert D''Ambrosia, MD, Chairman of the Department of Orthopedics at the University of Colorado, Denver, and former President of the American Academy of Orthopaedic Surgeons, as well as an Editorial Board of over 100 international orthopedists, Orthopedics is the source to turn to for guidance in your practice.
The journal offers access to current articles, as well as several years of archived content. Highlights also include Blue Ribbon articles published full text in print and online, as well as Tips & Techniques posted with every issue.