Brain versus bot: Distinguishing letters of recommendation authored by humans compared with artificial intelligence

IF 1.7 Q2 EDUCATION, SCIENTIFIC DISCIPLINES AEM Education and Training Pub Date : 2023-11-30 DOI:10.1002/aet2.10924
Carl Preiksaitis MD, Christopher Nash MD, EdM, Michael Gottlieb MD, Teresa M. Chan MD, MHPE, Al'ai Alvarez MD, Adaira Landry MD
{"title":"Brain versus bot: Distinguishing letters of recommendation authored by humans compared with artificial intelligence","authors":"Carl Preiksaitis MD,&nbsp;Christopher Nash MD, EdM,&nbsp;Michael Gottlieb MD,&nbsp;Teresa M. Chan MD, MHPE,&nbsp;Al'ai Alvarez MD,&nbsp;Adaira Landry MD","doi":"10.1002/aet2.10924","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objectives</h3>\n \n <p>Letters of recommendation (LORs) are essential within academic medicine, affecting a number of important decisions regarding advancement, yet these letters take significant amounts of time and labor to prepare. The use of generative artificial intelligence (AI) tools, such as ChatGPT, are gaining popularity for a variety of academic writing tasks and offer an innovative solution to relieve the burden of letter writing. It is yet to be determined if ChatGPT could aid in crafting LORs, particularly in high-stakes contexts like faculty promotion. To determine the feasibility of this process and whether there is a significant difference between AI and human-authored letters, we conducted a study aimed at determining whether academic physicians can distinguish between the two.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>A quasi-experimental study was conducted using a single-blind design. Academic physicians with experience in reviewing LORs were presented with LORs for promotion to associate professor, written by either humans or AI. Participants reviewed LORs and identified the authorship. Statistical analysis was performed to determine accuracy in distinguishing between human and AI-authored LORs. Additionally, the perceived quality and persuasiveness of the LORs were compared based on suspected and actual authorship.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>A total of 32 participants completed letter review. The mean accuracy of distinguishing between human- versus AI-authored LORs was 59.4%. The reviewer's certainty and time spent deliberating did not significantly impact accuracy. LORs suspected to be human-authored were rated more favorably in terms of quality and persuasiveness. A difference in gender-biased language was observed in our letters: human-authored letters contained significantly more female-associated words, while the majority of AI-authored letters tended to use more male-associated words.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Participants were unable to reliably differentiate between human- and AI-authored LORs for promotion. AI may be able to generate LORs and relieve the burden of letter writing for academicians. New strategies, policies, and guidelines are needed to balance the benefits of AI while preserving integrity and fairness in academic promotion decisions.</p>\n </section>\n </div>","PeriodicalId":37032,"journal":{"name":"AEM Education and Training","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aet2.10924","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AEM Education and Training","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aet2.10924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives

Letters of recommendation (LORs) are essential within academic medicine, affecting a number of important decisions regarding advancement, yet these letters take significant amounts of time and labor to prepare. The use of generative artificial intelligence (AI) tools, such as ChatGPT, are gaining popularity for a variety of academic writing tasks and offer an innovative solution to relieve the burden of letter writing. It is yet to be determined if ChatGPT could aid in crafting LORs, particularly in high-stakes contexts like faculty promotion. To determine the feasibility of this process and whether there is a significant difference between AI and human-authored letters, we conducted a study aimed at determining whether academic physicians can distinguish between the two.

Methods

A quasi-experimental study was conducted using a single-blind design. Academic physicians with experience in reviewing LORs were presented with LORs for promotion to associate professor, written by either humans or AI. Participants reviewed LORs and identified the authorship. Statistical analysis was performed to determine accuracy in distinguishing between human and AI-authored LORs. Additionally, the perceived quality and persuasiveness of the LORs were compared based on suspected and actual authorship.

Results

A total of 32 participants completed letter review. The mean accuracy of distinguishing between human- versus AI-authored LORs was 59.4%. The reviewer's certainty and time spent deliberating did not significantly impact accuracy. LORs suspected to be human-authored were rated more favorably in terms of quality and persuasiveness. A difference in gender-biased language was observed in our letters: human-authored letters contained significantly more female-associated words, while the majority of AI-authored letters tended to use more male-associated words.

Conclusions

Participants were unable to reliably differentiate between human- and AI-authored LORs for promotion. AI may be able to generate LORs and relieve the burden of letter writing for academicians. New strategies, policies, and guidelines are needed to balance the benefits of AI while preserving integrity and fairness in academic promotion decisions.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大脑与机器人:区分由人类和人工智能撰写的推荐信
推荐信(LORs)在学术医学中是必不可少的,影响着许多关于晋升的重要决定,但是这些推荐信需要花费大量的时间和精力来准备。ChatGPT等生成式人工智能(AI)工具在各种学术写作任务中越来越受欢迎,为减轻信件写作负担提供了创新的解决方案。ChatGPT是否能帮助制定LORs,特别是在教员晋升等高风险的情况下,还有待确定。为了确定这一过程的可行性,以及人工智能和人类撰写的信件之间是否存在显著差异,我们进行了一项研究,旨在确定学术医生是否能够区分这两者。方法采用单盲设计进行准实验研究。具有审查LORs经验的学术医师被授予LORs,以晋升为副教授,由人类或人工智能撰写。与会者审查了LORs并确定了作者。进行统计分析以确定区分人类和人工智能撰写的LORs的准确性。此外,在怀疑作者和实际作者的基础上,比较了lor的感知质量和说服力。结果32名受试者完成信评。区分人类和人工智能撰写的LORs的平均准确率为59.4%。审稿人的确定性和花在审议上的时间对准确性没有显著影响。被怀疑是人类撰写的lor在质量和说服力方面得到了更有利的评价。在我们的信件中观察到性别偏见语言的差异:人类撰写的信件包含更多与女性相关的单词,而大多数人工智能撰写的信件倾向于使用更多与男性相关的单词。结论:参与者无法可靠地区分人类和人工智能撰写的LORs。人工智能或许能够生成LORs,减轻院士们写信的负担。需要新的战略、政策和指导方针来平衡人工智能的好处,同时保持学术推广决策的完整性和公平性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
AEM Education and Training
AEM Education and Training Nursing-Emergency Nursing
CiteScore
2.60
自引率
22.20%
发文量
89
期刊最新文献
Issue Information Paths to learning: How residents navigate transience in supervisory relationships in the emergency department General emergency physician perceptions of caring for children: A qualitative interview study 5, 4, 3, 2, 1, 0: An evidence-based mnemonic to aid recall and interpretation of heart rate values for pediatric patients presenting for acute care “Ardor and diligence”: Quantifying the faculty effort needed in emergency medicine graduate medical education
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1