人工智能在骨科教育中的应用:ChatGPT 和 Bing AI 的骨科在训考试成绩对比分析

Medicine Advances Pub Date : 2024-09-15 DOI:10.1002/med4.77
Clark J. Chen, Vivek K. Bilolikar, Duncan VanNest, James Raphael, Gene Shaffer
{"title":"人工智能在骨科教育中的应用:ChatGPT 和 Bing AI 的骨科在训考试成绩对比分析","authors":"Clark J. Chen,&nbsp;Vivek K. Bilolikar,&nbsp;Duncan VanNest,&nbsp;James Raphael,&nbsp;Gene Shaffer","doi":"10.1002/med4.77","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>This study evaluated the performance of generative artificial intelligence (AI) models on the Orthopaedic In-Training Examination (OITE), an annual exam administered to U.S. orthopaedic residency programs.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>ChatGPT 3.5 and Bing AI GPT 4.0 were evaluated on standardised sets of multiple-choice questions drawn from the American Academy of Orthopaedic Surgeons OITE online question bank spanning 5 years (2018–2022). A total of 1165 questions were posed to each AI system. The performance of both systems was standardised using the latest versions of ChatGPT 3.5 and Bing AI GPT 4.0. Historical data of resident scores taken from the annual OITE technical reports was used as a comparison.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Across the five datasets, ChatGPT 3.5 scored an average of 55.0% on the OITE questions. Bing AI GPT 4.0 scored higher with an average of 80.0%. In comparison, the average performance of orthopaedic residents in national accredited programs was 62.1%. Bing AI GPT 4.0 outperformed ChatGPT 3.5 and Accreditation Council for Graduate Medical Education examinees, and analysis of variance analysis demonstrated <i>p</i> &lt; 0.001 among groups. The best performance was by Bing AI GPT 4.0 on OITE 2020.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Generative AI can provide a logical context across answer responses through its in-depth information searches and citation of resources. This combination presents a convincing argument for the possible uses of AI in medical education as an interactive learning aid.</p>\n </section>\n </div>","PeriodicalId":100913,"journal":{"name":"Medicine Advances","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/med4.77","citationCount":"0","resultStr":"{\"title\":\"Artificial intelligence in orthopaedic education: A comparative analysis of ChatGPT and Bing AI's Orthopaedic In-Training Examination performance\",\"authors\":\"Clark J. Chen,&nbsp;Vivek K. Bilolikar,&nbsp;Duncan VanNest,&nbsp;James Raphael,&nbsp;Gene Shaffer\",\"doi\":\"10.1002/med4.77\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>This study evaluated the performance of generative artificial intelligence (AI) models on the Orthopaedic In-Training Examination (OITE), an annual exam administered to U.S. orthopaedic residency programs.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>ChatGPT 3.5 and Bing AI GPT 4.0 were evaluated on standardised sets of multiple-choice questions drawn from the American Academy of Orthopaedic Surgeons OITE online question bank spanning 5 years (2018–2022). A total of 1165 questions were posed to each AI system. The performance of both systems was standardised using the latest versions of ChatGPT 3.5 and Bing AI GPT 4.0. Historical data of resident scores taken from the annual OITE technical reports was used as a comparison.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Across the five datasets, ChatGPT 3.5 scored an average of 55.0% on the OITE questions. Bing AI GPT 4.0 scored higher with an average of 80.0%. In comparison, the average performance of orthopaedic residents in national accredited programs was 62.1%. Bing AI GPT 4.0 outperformed ChatGPT 3.5 and Accreditation Council for Graduate Medical Education examinees, and analysis of variance analysis demonstrated <i>p</i> &lt; 0.001 among groups. The best performance was by Bing AI GPT 4.0 on OITE 2020.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>Generative AI can provide a logical context across answer responses through its in-depth information searches and citation of resources. This combination presents a convincing argument for the possible uses of AI in medical education as an interactive learning aid.</p>\\n </section>\\n </div>\",\"PeriodicalId\":100913,\"journal\":{\"name\":\"Medicine Advances\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/med4.77\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medicine Advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/med4.77\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine Advances","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/med4.77","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景 本研究评估了生成式人工智能(AI)模型在骨科住院医师培训考试(OITE)中的表现,OITE 是美国骨科住院医师培训项目的年度考试。 方法 对 ChatGPT 3.5 和 Bing AI GPT 4.0 进行了评估,评估对象是从美国骨科外科医生学会 OITE 在线题库中抽取的标准化选择题集,时间跨度为 5 年(2018-2022 年)。每个人工智能系统共收到 1165 个问题。两个系统的性能均使用 ChatGPT 3.5 和 Bing AI GPT 4.0 的最新版本进行标准化。居民分数的历史数据取自年度 OITE 技术报告,用于比较。 结果 在五个数据集中,ChatGPT 3.5 在 OITE 问题上的平均得分率为 55.0%。Bing AI GPT 4.0 的平均得分更高,达到 80.0%。相比之下,国家认证项目中骨科住院医师的平均成绩为 62.1%。Bing AI GPT 4.0 的表现优于 ChatGPT 3.5 和美国毕业后医学教育认证委员会的考生,方差分析显示各组间的 p < 0.001。Bing AI GPT 4.0 在 OITE 2020 上的表现最佳。 结论 生成式人工智能可以通过深入的信息搜索和资源引用,为整个答案回答提供逻辑背景。这种组合为在医学教育中使用人工智能作为互动学习辅助工具提供了令人信服的论据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Artificial intelligence in orthopaedic education: A comparative analysis of ChatGPT and Bing AI's Orthopaedic In-Training Examination performance

Background

This study evaluated the performance of generative artificial intelligence (AI) models on the Orthopaedic In-Training Examination (OITE), an annual exam administered to U.S. orthopaedic residency programs.

Methods

ChatGPT 3.5 and Bing AI GPT 4.0 were evaluated on standardised sets of multiple-choice questions drawn from the American Academy of Orthopaedic Surgeons OITE online question bank spanning 5 years (2018–2022). A total of 1165 questions were posed to each AI system. The performance of both systems was standardised using the latest versions of ChatGPT 3.5 and Bing AI GPT 4.0. Historical data of resident scores taken from the annual OITE technical reports was used as a comparison.

Results

Across the five datasets, ChatGPT 3.5 scored an average of 55.0% on the OITE questions. Bing AI GPT 4.0 scored higher with an average of 80.0%. In comparison, the average performance of orthopaedic residents in national accredited programs was 62.1%. Bing AI GPT 4.0 outperformed ChatGPT 3.5 and Accreditation Council for Graduate Medical Education examinees, and analysis of variance analysis demonstrated p < 0.001 among groups. The best performance was by Bing AI GPT 4.0 on OITE 2020.

Conclusion

Generative AI can provide a logical context across answer responses through its in-depth information searches and citation of resources. This combination presents a convincing argument for the possible uses of AI in medical education as an interactive learning aid.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Issue Information Prognostic significance of ratio of positive lymph nodes in patients with operable major salivary ductal carcinoma Artificial intelligence in orthopaedic education: A comparative analysis of ChatGPT and Bing AI's Orthopaedic In-Training Examination performance Anti-synthetase syndrome complicated by multifocal tuberculosis: A thought-provoking differential diagnosis with tumors Toward bridging gaps in patient navigation: A study on the adoption of artificial intelligence technologies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1