Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.

Q3 Medicine JMIR dermatology Pub Date : 2024-05-16 DOI:10.2196/55898
Raphaella Lambert, Zi-Yi Choo, Kelsey Gradwohl, Liesl Schroedl, Arlene Ruiz De Luzuriaga
{"title":"Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.","authors":"Raphaella Lambert, Zi-Yi Choo, Kelsey Gradwohl, Liesl Schroedl, Arlene Ruiz De Luzuriaga","doi":"10.2196/55898","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Dermatologic patient education materials (PEMs) are often written above the national average seventh- to eighth-grade reading level. ChatGPT-3.5, GPT-4, DermGPT, and DocsGPT are large language models (LLMs) that are responsive to user prompts. Our project assesses their use in generating dermatologic PEMs at specified reading levels.</p><p><strong>Objective: </strong>This study aims to assess the ability of select LLMs to generate PEMs for common and rare dermatologic conditions at unspecified and specified reading levels. Further, the study aims to assess the preservation of meaning across such LLM-generated PEMs, as assessed by dermatology resident trainees.</p><p><strong>Methods: </strong>The Flesch-Kincaid reading level (FKRL) of current American Academy of Dermatology PEMs was evaluated for 4 common (atopic dermatitis, acne vulgaris, psoriasis, and herpes zoster) and 4 rare (epidermolysis bullosa, bullous pemphigoid, lamellar ichthyosis, and lichen planus) dermatologic conditions. We prompted ChatGPT-3.5, GPT-4, DermGPT, and DocsGPT to \"Create a patient education handout about [condition] at a [FKRL]\" to iteratively generate 10 PEMs per condition at unspecified fifth- and seventh-grade FKRLs, evaluated with Microsoft Word readability statistics. The preservation of meaning across LLMs was assessed by 2 dermatology resident trainees.</p><p><strong>Results: </strong>The current American Academy of Dermatology PEMs had an average (SD) FKRL of 9.35 (1.26) and 9.50 (2.3) for common and rare diseases, respectively. For common diseases, the FKRLs of LLM-produced PEMs ranged between 9.8 and 11.21 (unspecified prompt), between 4.22 and 7.43 (fifth-grade prompt), and between 5.98 and 7.28 (seventh-grade prompt). For rare diseases, the FKRLs of LLM-produced PEMs ranged between 9.85 and 11.45 (unspecified prompt), between 4.22 and 7.43 (fifth-grade prompt), and between 5.98 and 7.28 (seventh-grade prompt). At the fifth-grade reading level, GPT-4 was better at producing PEMs for both common and rare conditions than ChatGPT-3.5 (P=.001 and P=.01, respectively), DermGPT (P<.001 and P=.03, respectively), and DocsGPT (P<.001 and P=.02, respectively). At the seventh-grade reading level, no significant difference was found between ChatGPT-3.5, GPT-4, DocsGPT, or DermGPT in producing PEMs for common conditions (all P>.05); however, for rare conditions, ChatGPT-3.5 and DocsGPT outperformed GPT-4 (P=.003 and P<.001, respectively). The preservation of meaning analysis revealed that for common conditions, DermGPT ranked the highest for overall ease of reading, patient understandability, and accuracy (14.75/15, 98%); for rare conditions, handouts generated by GPT-4 ranked the highest (14.5/15, 97%).</p><p><strong>Conclusions: </strong>GPT-4 appeared to outperform ChatGPT-3.5, DocsGPT, and DermGPT at the fifth-grade FKRL for both common and rare conditions, although both ChatGPT-3.5 and DocsGPT performed better than GPT-4 at the seventh-grade FKRL for rare conditions. LLM-produced PEMs may reliably meet seventh-grade FKRLs for select common and rare dermatologic conditions and are easy to read, understandable for patients, and mostly accurate. LLMs may play a role in enhancing health literacy and disseminating accessible, understandable PEMs in dermatology.</p>","PeriodicalId":73553,"journal":{"name":"JMIR dermatology","volume":"7 ","pages":"e55898"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11140271/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR dermatology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/55898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Dermatologic patient education materials (PEMs) are often written above the national average seventh- to eighth-grade reading level. ChatGPT-3.5, GPT-4, DermGPT, and DocsGPT are large language models (LLMs) that are responsive to user prompts. Our project assesses their use in generating dermatologic PEMs at specified reading levels.

Objective: This study aims to assess the ability of select LLMs to generate PEMs for common and rare dermatologic conditions at unspecified and specified reading levels. Further, the study aims to assess the preservation of meaning across such LLM-generated PEMs, as assessed by dermatology resident trainees.

Methods: The Flesch-Kincaid reading level (FKRL) of current American Academy of Dermatology PEMs was evaluated for 4 common (atopic dermatitis, acne vulgaris, psoriasis, and herpes zoster) and 4 rare (epidermolysis bullosa, bullous pemphigoid, lamellar ichthyosis, and lichen planus) dermatologic conditions. We prompted ChatGPT-3.5, GPT-4, DermGPT, and DocsGPT to "Create a patient education handout about [condition] at a [FKRL]" to iteratively generate 10 PEMs per condition at unspecified fifth- and seventh-grade FKRLs, evaluated with Microsoft Word readability statistics. The preservation of meaning across LLMs was assessed by 2 dermatology resident trainees.

Results: The current American Academy of Dermatology PEMs had an average (SD) FKRL of 9.35 (1.26) and 9.50 (2.3) for common and rare diseases, respectively. For common diseases, the FKRLs of LLM-produced PEMs ranged between 9.8 and 11.21 (unspecified prompt), between 4.22 and 7.43 (fifth-grade prompt), and between 5.98 and 7.28 (seventh-grade prompt). For rare diseases, the FKRLs of LLM-produced PEMs ranged between 9.85 and 11.45 (unspecified prompt), between 4.22 and 7.43 (fifth-grade prompt), and between 5.98 and 7.28 (seventh-grade prompt). At the fifth-grade reading level, GPT-4 was better at producing PEMs for both common and rare conditions than ChatGPT-3.5 (P=.001 and P=.01, respectively), DermGPT (P<.001 and P=.03, respectively), and DocsGPT (P<.001 and P=.02, respectively). At the seventh-grade reading level, no significant difference was found between ChatGPT-3.5, GPT-4, DocsGPT, or DermGPT in producing PEMs for common conditions (all P>.05); however, for rare conditions, ChatGPT-3.5 and DocsGPT outperformed GPT-4 (P=.003 and P<.001, respectively). The preservation of meaning analysis revealed that for common conditions, DermGPT ranked the highest for overall ease of reading, patient understandability, and accuracy (14.75/15, 98%); for rare conditions, handouts generated by GPT-4 ranked the highest (14.5/15, 97%).

Conclusions: GPT-4 appeared to outperform ChatGPT-3.5, DocsGPT, and DermGPT at the fifth-grade FKRL for both common and rare conditions, although both ChatGPT-3.5 and DocsGPT performed better than GPT-4 at the seventh-grade FKRL for rare conditions. LLM-produced PEMs may reliably meet seventh-grade FKRLs for select common and rare dermatologic conditions and are easy to read, understandable for patients, and mostly accurate. LLMs may play a role in enhancing health literacy and disseminating accessible, understandable PEMs in dermatology.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
根据阅读水平评估大语言模型在生成皮肤科患者教育材料中的应用:定性研究。
背景:皮肤科患者教育材料(PEM)的编写水平通常高于全国七至八年级的平均阅读水平。ChatGPT-3.5、GPT-4、DermGPT 和 DocsGPT 是大型语言模型 (LLM),可响应用户提示。我们的项目评估了它们在特定阅读水平下生成皮肤病学 PEM 的使用情况:本研究旨在评估精选 LLM 在未指定和指定阅读水平下生成常见和罕见皮肤病 PEM 的能力。此外,该研究还旨在评估由皮肤科住院受训者评估的由 LLM 生成的 PEM 的意义保持情况:方法:针对 4 种常见皮肤病(特应性皮炎、寻常痤疮、银屑病和带状疱疹)和 4 种罕见皮肤病(表皮松解症、丘疹性类风湿关节炎、板层状鱼鳞病和扁平苔藓),评估了美国皮肤病学会当前 PEM 的 Flesch-Kincaid 阅读水平 (FKRL)。我们提示 ChatGPT-3.5、GPT-4、DermGPT 和 DocsGPT "在[FKRL]下创建一份关于[病症]的患者教育手册",从而在未指定的五年级和七年级 FKRL 下为每种病症反复生成 10 份 PEM,并通过 Microsoft Word 可读性统计进行评估。由 2 名皮肤科住院受训者对各 LLM 的意义保持情况进行评估:目前美国皮肤病学会的 PEM 对于常见病和罕见病的平均(标清)FKRL 分别为 9.35 (1.26) 和 9.50 (2.3)。就常见疾病而言,LLM 制作的 PEM 的 FKRL 介于 9.8 至 11.21 之间(未指定提示),介于 4.22 至 7.43 之间(五级提示),以及介于 5.98 至 7.28 之间(七级提示)。对于罕见疾病,LLM 制作的 PEM 的 FKRL 在 9.85 到 11.45 之间(未指定提示),在 4.22 到 7.43 之间(五年级提示),在 5.98 到 7.28 之间(七年级提示)。在五年级阅读水平上,GPT-4 在常见和罕见情况下的 PEM 生成能力均优于 ChatGPT-3.5(P=.001 和 P=.01)和 DermGPT(P.05);但在罕见情况下,ChatGPT-3.5 和 DocsGPT 的表现优于 GPT-4(P=.003 和 PConclusions):就常见和罕见病症而言,GPT-4 在五年级 FKRL 中的表现似乎优于 ChatGPT-3.5、DocsGPT 和 DermGPT,但就罕见病症而言,ChatGPT-3.5 和 DocsGPT 在七年级 FKRL 中的表现均优于 GPT-4。由 LLM 制作的 PEM 可以可靠地满足某些常见和罕见皮肤病的七级 FKRL 要求,而且易于阅读,患者可以理解,大部分情况下也很准确。LLM 可在提高皮肤病学领域的健康素养和传播易读、易懂的 PEM 方面发挥作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.20
自引率
0.00%
发文量
0
审稿时长
18 weeks
期刊最新文献
The Comparative Sufficiency of ChatGPT, Google Bard, and Bing AI in Answering Diagnosis, Treatment, and Prognosis Questions About Common Dermatological Diagnoses. The Depth Estimation and Visualization of Dermatological Lesions: Development and Usability Study. Dermatology in Student-Run Clinics in the United States: Scoping Review. Improving Affordability in Dermatology: Cost Savings in Mark Cuban Cost Plus Drug Company Versus GoodRx. Dermatologic Data From the Global Burden of Disease Study 2019 and the PatientsLikeMe Online Support Community: Comparative Analysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1