Brain versus bot: Distinguishing letters of recommendation authored by humans compared with artificial intelligence

IF 1.7 Q2 EDUCATION, SCIENTIFIC DISCIPLINES AEM Education and Training Pub Date : 2023-11-30 DOI:10.1002/aet2.10924

Carl Preiksaitis MD, Christopher Nash MD, EdM, Michael Gottlieb MD, Teresa M. Chan MD, MHPE, Al'ai Alvarez MD, Adaira Landry MD

{"title":"Brain versus bot: Distinguishing letters of recommendation authored by humans compared with artificial intelligence","authors":"Carl Preiksaitis MD, Christopher Nash MD, EdM, Michael Gottlieb MD, Teresa M. Chan MD, MHPE, Al'ai Alvarez MD, Adaira Landry MD","doi":"10.1002/aet2.10924","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objectives</h3>\n \n <p>Letters of recommendation (LORs) are essential within academic medicine, affecting a number of important decisions regarding advancement, yet these letters take significant amounts of time and labor to prepare. The use of generative artificial intelligence (AI) tools, such as ChatGPT, are gaining popularity for a variety of academic writing tasks and offer an innovative solution to relieve the burden of letter writing. It is yet to be determined if ChatGPT could aid in crafting LORs, particularly in high-stakes contexts like faculty promotion. To determine the feasibility of this process and whether there is a significant difference between AI and human-authored letters, we conducted a study aimed at determining whether academic physicians can distinguish between the two.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>A quasi-experimental study was conducted using a single-blind design. Academic physicians with experience in reviewing LORs were presented with LORs for promotion to associate professor, written by either humans or AI. Participants reviewed LORs and identified the authorship. Statistical analysis was performed to determine accuracy in distinguishing between human and AI-authored LORs. Additionally, the perceived quality and persuasiveness of the LORs were compared based on suspected and actual authorship.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>A total of 32 participants completed letter review. The mean accuracy of distinguishing between human- versus AI-authored LORs was 59.4%. The reviewer's certainty and time spent deliberating did not significantly impact accuracy. LORs suspected to be human-authored were rated more favorably in terms of quality and persuasiveness. A difference in gender-biased language was observed in our letters: human-authored letters contained significantly more female-associated words, while the majority of AI-authored letters tended to use more male-associated words.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Participants were unable to reliably differentiate between human- and AI-authored LORs for promotion. AI may be able to generate LORs and relieve the burden of letter writing for academicians. New strategies, policies, and guidelines are needed to balance the benefits of AI while preserving integrity and fairness in academic promotion decisions.</p>\n </section>\n </div>","PeriodicalId":37032,"journal":{"name":"AEM Education and Training","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aet2.10924","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AEM Education and Training","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aet2.10924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives

Letters of recommendation (LORs) are essential within academic medicine, affecting a number of important decisions regarding advancement, yet these letters take significant amounts of time and labor to prepare. The use of generative artificial intelligence (AI) tools, such as ChatGPT, are gaining popularity for a variety of academic writing tasks and offer an innovative solution to relieve the burden of letter writing. It is yet to be determined if ChatGPT could aid in crafting LORs, particularly in high-stakes contexts like faculty promotion. To determine the feasibility of this process and whether there is a significant difference between AI and human-authored letters, we conducted a study aimed at determining whether academic physicians can distinguish between the two.

Methods

A quasi-experimental study was conducted using a single-blind design. Academic physicians with experience in reviewing LORs were presented with LORs for promotion to associate professor, written by either humans or AI. Participants reviewed LORs and identified the authorship. Statistical analysis was performed to determine accuracy in distinguishing between human and AI-authored LORs. Additionally, the perceived quality and persuasiveness of the LORs were compared based on suspected and actual authorship.

Results

A total of 32 participants completed letter review. The mean accuracy of distinguishing between human- versus AI-authored LORs was 59.4%. The reviewer's certainty and time spent deliberating did not significantly impact accuracy. LORs suspected to be human-authored were rated more favorably in terms of quality and persuasiveness. A difference in gender-biased language was observed in our letters: human-authored letters contained significantly more female-associated words, while the majority of AI-authored letters tended to use more male-associated words.

Conclusions

Participants were unable to reliably differentiate between human- and AI-authored LORs for promotion. AI may be able to generate LORs and relieve the burden of letter writing for academicians. New strategies, policies, and guidelines are needed to balance the benefits of AI while preserving integrity and fairness in academic promotion decisions.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大脑与机器人:区分由人类和人工智能撰写的推荐信

推荐信(LORs)在学术医学中是必不可少的，影响着许多关于晋升的重要决定，但是这些推荐信需要花费大量的时间和精力来准备。ChatGPT等生成式人工智能(AI)工具在各种学术写作任务中越来越受欢迎，为减轻信件写作负担提供了创新的解决方案。ChatGPT是否能帮助制定LORs，特别是在教员晋升等高风险的情况下，还有待确定。为了确定这一过程的可行性，以及人工智能和人类撰写的信件之间是否存在显著差异，我们进行了一项研究，旨在确定学术医生是否能够区分这两者。方法采用单盲设计进行准实验研究。具有审查LORs经验的学术医师被授予LORs，以晋升为副教授，由人类或人工智能撰写。与会者审查了LORs并确定了作者。进行统计分析以确定区分人类和人工智能撰写的LORs的准确性。此外，在怀疑作者和实际作者的基础上，比较了lor的感知质量和说服力。结果32名受试者完成信评。区分人类和人工智能撰写的LORs的平均准确率为59.4%。审稿人的确定性和花在审议上的时间对准确性没有显著影响。被怀疑是人类撰写的lor在质量和说服力方面得到了更有利的评价。在我们的信件中观察到性别偏见语言的差异:人类撰写的信件包含更多与女性相关的单词，而大多数人工智能撰写的信件倾向于使用更多与男性相关的单词。结论:参与者无法可靠地区分人类和人工智能撰写的LORs。人工智能或许能够生成LORs，减轻院士们写信的负担。需要新的战略、政策和指导方针来平衡人工智能的好处，同时保持学术推广决策的完整性和公平性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

AEM Education and Training Nursing-Emergency Nursing

CiteScore

2.60

自引率

22.20%

发文量