Large Language Model Use in Radiology Residency Applications: Unwelcomed but Inevitable.

Emile B Gordon, Charles Maxfield, Robert French, Laura J Fish, Jacob Romm, Emily Barre, Erica Kinne, Ryan Peterson, Lars J Grimm
{"title":"Large Language Model Use in Radiology Residency Applications: Unwelcomed but Inevitable.","authors":"Emile B Gordon, Charles Maxfield, Robert French, Laura J Fish, Jacob Romm, Emily Barre, Erica Kinne, Ryan Peterson, Lars J Grimm","doi":"10.1016/j.jacr.2024.08.027","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study explores radiology program directors' perspectives on the impact of large language model (LLM) use among residency applicants to craft personal statements.</p><p><strong>Methods: </strong>Eight program directors from the Radiology Residency Education Research Alliance (RRERA) participated in a mixed-methods study, which included a survey regarding impressions of AI-generated personal statements and focus group discussions (July 2023). Each director reviewed four personal statement variations for five applicants, blinded to author type: the original and three ChatGPT-4.0 versions generated with varying prompts, aggregated for analysis. A 5-point Likert scale surveyed the writing quality, including voice, clarity, engagement, organization, and the perceived origin of each statement. An experienced qualitative researcher facilitated focus group discussions. Data analysis was performed using a rapid analytic approach with a coding template capturing key areas related to residency applications.</p><p><strong>Results: </strong>GPT-generated statement (GPT) ratings were more often average or worse in quality (56%, 268/475) than ratings of human-authored statements (Hu) (29% [45/160]). Although reviewers were not confident in their ability to distinguish the origin of personal statements, they did so reliably and consistently, identifying the human-authored personal statements at 95% (38/40) as probably or definitely original. Focus group discussions highlighted the inevitable use of AI in crafting personal statements and concerns about its impact on the authenticity and the value of the personal statement in residency selections. Program directors were divided on the appropriate use and regulation of AI.</p><p><strong>Discussion: </strong>Radiology residency program directors rated LLM-generated personal statements as lower in quality and expressed concern about the loss of the applicant's voice but acknowledged the inevitability of increased AI use in the generation of application statements.</p>","PeriodicalId":73968,"journal":{"name":"Journal of the American College of Radiology : JACR","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American College of Radiology : JACR","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.jacr.2024.08.027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: This study explores radiology program directors' perspectives on the impact of large language model (LLM) use among residency applicants to craft personal statements.

Methods: Eight program directors from the Radiology Residency Education Research Alliance (RRERA) participated in a mixed-methods study, which included a survey regarding impressions of AI-generated personal statements and focus group discussions (July 2023). Each director reviewed four personal statement variations for five applicants, blinded to author type: the original and three ChatGPT-4.0 versions generated with varying prompts, aggregated for analysis. A 5-point Likert scale surveyed the writing quality, including voice, clarity, engagement, organization, and the perceived origin of each statement. An experienced qualitative researcher facilitated focus group discussions. Data analysis was performed using a rapid analytic approach with a coding template capturing key areas related to residency applications.

Results: GPT-generated statement (GPT) ratings were more often average or worse in quality (56%, 268/475) than ratings of human-authored statements (Hu) (29% [45/160]). Although reviewers were not confident in their ability to distinguish the origin of personal statements, they did so reliably and consistently, identifying the human-authored personal statements at 95% (38/40) as probably or definitely original. Focus group discussions highlighted the inevitable use of AI in crafting personal statements and concerns about its impact on the authenticity and the value of the personal statement in residency selections. Program directors were divided on the appropriate use and regulation of AI.

Discussion: Radiology residency program directors rated LLM-generated personal statements as lower in quality and expressed concern about the loss of the applicant's voice but acknowledged the inevitability of increased AI use in the generation of application statements.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在放射科住院医师申请中使用大型语言模型:不受欢迎但不可避免。
目的本研究探讨放射学项目主任对住院医师申请者使用大语言模型(LLM)撰写个人陈述的影响的看法:来自放射学住院医师教育研究联盟(RRERA)的八位项目主任参与了一项混合方法研究,其中包括一项关于人工智能生成的个人陈述印象的调查和焦点小组讨论(2023 年 7 月)。每位主任审查了五位申请人的四份个人陈述变体,并对作者类型进行了盲审:原始版本和根据不同提示生成的三个 ChatGPT-4.0 版本,汇总后进行分析。采用 5 分李克特量表对写作质量进行调查,包括语音、清晰度、参与度、条理性以及每份陈述的感知来源。一位经验丰富的定性研究人员主持了焦点小组讨论。数据分析采用快速分析方法进行,编码模板捕捉了与住院实习申请相关的关键领域:结果:GPT生成的声明(GPT)的评分质量一般或更差(56%,268/475),高于人类撰写的声明(Hu)的评分(29% [45/160])。尽管审稿人对自己辨别个人陈述来源的能力并不自信,但他们的辨别能力是可靠和一致的,95%(38/40)的人撰个人陈述可能或肯定是原创的。焦点小组讨论强调了在撰写个人陈述时不可避免地使用人工智能的问题,以及人工智能对个人陈述的真实性和在住院医生遴选中的价值所产生的影响。项目主任们对人工智能的适当使用和监管存在分歧:放射科住院医师培训项目主任认为由 LLM 生成的个人陈述质量较低,并对失去申请人的声音表示担忧,但也承认在生成申请陈述时越来越多地使用人工智能是不可避免的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Classification and communication of critical findings in emergency radiology: a scoping review. Institutional review of usage and referral pattern of radiologic voiding examinations (ceVUS and VCUG). Patient-Friendly Summary of the ACR Appropriateness Criteria®: Dizziness and Ataxia: 2024 Update. Embracing Appreciative Inquiry in Radiology: A Strategy for Enhancing Performance. Large Language Model Use in Radiology Residency Applications: Unwelcomed but Inevitable.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1