Can Artificial Intelligence Deceive Residency Committees? A Randomized Multicenter Analysis of Letters of Recommendation.

IF 2.6 2区 医学 Q1 ORTHOPEDICS Journal of the American Academy of Orthopaedic Surgeons Pub Date : 2024-12-12 DOI:10.5435/JAAOS-D-24-00438
Samuel K Simister, Eric G Huish, Eugene Y Tsai, Hai V Le, Andrea Halim, Dominick Tuason, John P Meehan, Holly B Leshikar, Augustine M Saiz, Zachary C Lum
{"title":"Can Artificial Intelligence Deceive Residency Committees? A Randomized Multicenter Analysis of Letters of Recommendation.","authors":"Samuel K Simister, Eric G Huish, Eugene Y Tsai, Hai V Le, Andrea Halim, Dominick Tuason, John P Meehan, Holly B Leshikar, Augustine M Saiz, Zachary C Lum","doi":"10.5435/JAAOS-D-24-00438","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The introduction of generative artificial intelligence (AI) may have a profound effect on residency applications. In this study, we explore the abilities of AI-generated letters of recommendation (LORs) by evaluating the accuracy of orthopaedic surgery residency selection committee members to identify LORs written by human or AI authors.</p><p><strong>Methods: </strong>In a multicenter, single-blind trial, a total of 45 LORs (15 human, 15 ChatGPT, and 15 Google BARD) were curated. In a random fashion, seven faculty reviewers from four residency programs were asked to grade each of the 45 LORs based on the 11 characteristics outlined in the American Orthopaedic Associations standardized LOR, as well as a 1 to 10 scale on how they would rank the applicant, their desire of having the applicant in the program, and if they thought the letter was generated by a human or AI author. Analysis included descriptives, ordinal regression, and a receiver operator characteristic curve to compare accuracy based on the number of letters reviewed.</p><p><strong>Results: </strong>Faculty reviewers correctly identified 40% (42/105) of human-generated and 63% (132/210) of AI-generated letters (P < 0.001), which did not increase over time (AUC 0.451, P = 0.102). When analyzed by perceived author, letters marked as human generated had significantly higher means for all variables (P = 0.01). BARD did markedly better than human authors in accuracy (3.25 [1.79 to 5.92], P < 0.001), adaptability (1.29 [1.02 to 1.65], P = 0.034), and perceived commitment (1.56 [0.99 to 2.47], P < 0.055). Additional analysis controlling for reviewer background showed no differences in outcomes based on experience or familiarity with the AI programs.</p><p><strong>Conclusion: </strong>Faculty members were unsuccessful in determining the difference between human-generated and AI-generated LORs 50% of the time, which suggests that AI can generate LORs similarly to human authors. This highlights the importance for selection committees to reconsider the role and influence of LORs on residency applications.</p>","PeriodicalId":51098,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5435/JAAOS-D-24-00438","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: The introduction of generative artificial intelligence (AI) may have a profound effect on residency applications. In this study, we explore the abilities of AI-generated letters of recommendation (LORs) by evaluating the accuracy of orthopaedic surgery residency selection committee members to identify LORs written by human or AI authors.

Methods: In a multicenter, single-blind trial, a total of 45 LORs (15 human, 15 ChatGPT, and 15 Google BARD) were curated. In a random fashion, seven faculty reviewers from four residency programs were asked to grade each of the 45 LORs based on the 11 characteristics outlined in the American Orthopaedic Associations standardized LOR, as well as a 1 to 10 scale on how they would rank the applicant, their desire of having the applicant in the program, and if they thought the letter was generated by a human or AI author. Analysis included descriptives, ordinal regression, and a receiver operator characteristic curve to compare accuracy based on the number of letters reviewed.

Results: Faculty reviewers correctly identified 40% (42/105) of human-generated and 63% (132/210) of AI-generated letters (P < 0.001), which did not increase over time (AUC 0.451, P = 0.102). When analyzed by perceived author, letters marked as human generated had significantly higher means for all variables (P = 0.01). BARD did markedly better than human authors in accuracy (3.25 [1.79 to 5.92], P < 0.001), adaptability (1.29 [1.02 to 1.65], P = 0.034), and perceived commitment (1.56 [0.99 to 2.47], P < 0.055). Additional analysis controlling for reviewer background showed no differences in outcomes based on experience or familiarity with the AI programs.

Conclusion: Faculty members were unsuccessful in determining the difference between human-generated and AI-generated LORs 50% of the time, which suggests that AI can generate LORs similarly to human authors. This highlights the importance for selection committees to reconsider the role and influence of LORs on residency applications.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.10
自引率
6.20%
发文量
529
审稿时长
4-8 weeks
期刊介绍: The Journal of the American Academy of Orthopaedic Surgeons was established in the fall of 1993 by the Academy in response to its membership’s demand for a clinical review journal. Two issues were published the first year, followed by six issues yearly from 1994 through 2004. In September 2005, JAAOS began publishing monthly issues. Each issue includes richly illustrated peer-reviewed articles focused on clinical diagnosis and management. Special features in each issue provide commentary on developments in pharmacotherapeutics, materials and techniques, and computer applications.
期刊最新文献
Increasing Body Mass Index Not Associated With Worse Patient-Reported Outcomes After Primary THA or TKA. Postoperative Complications and Readmission Rates in Robotic-Assisted Versus Manual Total Knee Arthroplasty: Large, Propensity Score-Matched Patient Cohorts. Advances in Anatomic Total Shoulder Arthroplasty Glenoid Implant Design. Evolution of Reverse Shoulder Arthroplasty Design Rationales and Where We Are Now. The Power of Preference Signaling: A Monumental Shift in the Orthopaedic Surgery Application Process.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1