Evaluating Large Language Model–Supported Instructions for Medication Use: First Steps Toward a Comprehensive Model

Zilma Silveira Nogueira Reis MD, PhD , Adriana Silvina Pagano MA, PhD , Isaias Jose Ramos de Oliveira MSc , Cristiane dos Santos Dias MD, PhD , Eura Martins Lage MD, PhD , Erico Franco Mineiro PhD , Glaucia Miranda Varella Pereira PhD , Igor de Carvalho Gomes MSc , Vinicius Araujo Basilio MS , Ricardo João Cruz-Correia PhD , Davi dos Reis de Jesus BCS , Antônio Pereira de Souza Júnior MS , Leonardo Chaves Dutra da Rocha PhD
{"title":"Evaluating Large Language Model–Supported Instructions for Medication Use: First Steps Toward a Comprehensive Model","authors":"Zilma Silveira Nogueira Reis MD, PhD ,&nbsp;Adriana Silvina Pagano MA, PhD ,&nbsp;Isaias Jose Ramos de Oliveira MSc ,&nbsp;Cristiane dos Santos Dias MD, PhD ,&nbsp;Eura Martins Lage MD, PhD ,&nbsp;Erico Franco Mineiro PhD ,&nbsp;Glaucia Miranda Varella Pereira PhD ,&nbsp;Igor de Carvalho Gomes MSc ,&nbsp;Vinicius Araujo Basilio MS ,&nbsp;Ricardo João Cruz-Correia PhD ,&nbsp;Davi dos Reis de Jesus BCS ,&nbsp;Antônio Pereira de Souza Júnior MS ,&nbsp;Leonardo Chaves Dutra da Rocha PhD","doi":"10.1016/j.mcpdig.2024.09.006","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To assess the support of large language models (LLMs) in generating clearer and more personalized medication instructions to enhance e-prescription.</div></div><div><h3>Patients and Methods</h3><div>We established patient-centered guidelines for adequate, acceptable, and personalized directions to enhance e-prescription. A dataset comprising 104 outpatient scenarios, with an array of medications, administration routes, and patient conditions, was developed following the Brazilian national e-prescribing standard. Three prompts were submitted to a closed-source LLM. The first prompt involved a generic command, the second one was calibrated for content enhancement and personalization, and the third one requested bias mitigation. The third prompt was submitted to an open-source LLM. Outputs were assessed using automated metrics and human evaluation. We conducted the study between March 1, 2024 and September 10, 2024.</div></div><div><h3>Results</h3><div>Adequacy scores of our closed-source LLM’s output showed the third prompt outperforming the first and second one. Full and partial acceptability was achieved in 94.3% of texts with the third prompt. Personalization was rated highly, especially with the second and third prompts. The 2 LLMs showed similar adequacy results. Lack of scientific evidence and factual errors were infrequent and unrelated to a particular prompt or LLM. The frequency of hallucinations was different for each LLM and concerned prescriptions issued upon symptom manifestation and medications requiring dosage adjustment or involving intermittent use. Gender bias was found in our closed-source LLM’s output for the first and second prompts, with the third one being bias-free. The second LLM’s output was bias-free.</div></div><div><h3>Conclusion</h3><div>This study demonstrates the potential of LLM-supported generation to produce prescription directions and improve communication between health professionals and patients within the e-prescribing system.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 4","pages":"Pages 632-644"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mayo Clinic Proceedings. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949761224001032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

To assess the support of large language models (LLMs) in generating clearer and more personalized medication instructions to enhance e-prescription.

Patients and Methods

We established patient-centered guidelines for adequate, acceptable, and personalized directions to enhance e-prescription. A dataset comprising 104 outpatient scenarios, with an array of medications, administration routes, and patient conditions, was developed following the Brazilian national e-prescribing standard. Three prompts were submitted to a closed-source LLM. The first prompt involved a generic command, the second one was calibrated for content enhancement and personalization, and the third one requested bias mitigation. The third prompt was submitted to an open-source LLM. Outputs were assessed using automated metrics and human evaluation. We conducted the study between March 1, 2024 and September 10, 2024.

Results

Adequacy scores of our closed-source LLM’s output showed the third prompt outperforming the first and second one. Full and partial acceptability was achieved in 94.3% of texts with the third prompt. Personalization was rated highly, especially with the second and third prompts. The 2 LLMs showed similar adequacy results. Lack of scientific evidence and factual errors were infrequent and unrelated to a particular prompt or LLM. The frequency of hallucinations was different for each LLM and concerned prescriptions issued upon symptom manifestation and medications requiring dosage adjustment or involving intermittent use. Gender bias was found in our closed-source LLM’s output for the first and second prompts, with the third one being bias-free. The second LLM’s output was bias-free.

Conclusion

This study demonstrates the potential of LLM-supported generation to produce prescription directions and improve communication between health professionals and patients within the e-prescribing system.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估大语言模型支持的用药指导:迈向综合模式的第一步
目的评估大型语言模型(LLM)在生成更清晰、更个性化的用药指导以提高电子处方水平方面的支持情况。我们根据巴西国家电子处方标准开发了一个数据集,其中包括 104 个门诊场景,涉及一系列药物、给药途径和患者状况。向封闭源 LLM 提交了三个提示。第一个提示涉及一个通用命令,第二个提示经过内容增强和个性化校准,第三个提示要求减少偏差。第三个提示提交给了开源 LLM。我们使用自动度量和人工评估对输出结果进行了评估。我们在 2024 年 3 月 1 日至 2024 年 9 月 10 日期间进行了这项研究。结果闭源 LLM 输出的适当性评分显示,第三条提示优于第一条和第二条提示。94.3%的文本在使用第三条提示时达到了完全或部分可接受性。个性化评分很高,尤其是第二和第三个提示。两份法律文件的适当性结果相似。缺乏科学证据和事实错误并不常见,而且与特定的提示或 LLM 无关。出现幻觉的频率在每个 LLM 中都有所不同,涉及到症状表现时开具的处方以及需要调整剂量或间歇性使用的药物。在我们的封闭源 LLM 中,第一个和第二个提示的输出存在性别偏差,第三个提示则没有性别偏差。结论这项研究表明,在电子处方系统中,LLM 支持生成处方指示并改善医疗专业人员与患者之间的沟通具有潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Mayo Clinic Proceedings. Digital health
Mayo Clinic Proceedings. Digital health Medicine and Dentistry (General), Health Informatics, Public Health and Health Policy
自引率
0.00%
发文量
0
审稿时长
47 days
期刊最新文献
Developing a Research Center for Artificial Intelligence in Medicine Strategic Considerations for Selecting Artificial Intelligence Solutions for Institutional Integration: A Single-Center Experience Reviewers for Mayo Clinic Proceedings: Digital Health (2024) A Blueprint for Clinical-Driven Medical Device Development: The Feverkidstool Application to Identify Children With Serious Bacterial Infection Cost-Effectiveness of Artificial Intelligence-Enabled Electrocardiograms for Early Detection of Low Ejection Fraction: A Secondary Analysis of the Electrocardiogram Artificial Intelligence-Guided Screening for Low Ejection Fraction Trial
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1