Qais Dihan, Muhammad Z Chauhan, Taher K Eleiwa, Andrew D Brown, Amr K Hassan, Mohamed M Khodeiry, Reem H Elsheikh, Isdin Oke, Bharti R Nihalani, Deborah K VanderVeen, Ahmed B Sallam, Abdelrahman M Elhusseiny
{"title":"大语言模型:儿科白内障患者教育的新领域。","authors":"Qais Dihan, Muhammad Z Chauhan, Taher K Eleiwa, Andrew D Brown, Amr K Hassan, Mohamed M Khodeiry, Reem H Elsheikh, Isdin Oke, Bharti R Nihalani, Deborah K VanderVeen, Ahmed B Sallam, Abdelrahman M Elhusseiny","doi":"10.1136/bjo-2024-325252","DOIUrl":null,"url":null,"abstract":"<p><strong>Background/aims: </strong>This was a cross-sectional comparative study. We evaluated the ability of three large language models (LLMs) (ChatGPT-3.5, ChatGPT-4, and Google Bard) to generate novel patient education materials (PEMs) and improve the readability of existing PEMs on paediatric cataract.</p><p><strong>Methods: </strong>We compared LLMs' responses to three prompts. Prompt A requested they write a handout on paediatric cataract that was 'easily understandable by an average American.' Prompt B modified prompt A and requested the handout be written at a 'sixth-grade reading level, using the Simple Measure of Gobbledygook (SMOG) readability formula.' Prompt C rewrote existing PEMs on paediatric cataract 'to a sixth-grade reading level using the SMOG readability formula'. Responses were compared on their quality (DISCERN; 1 (low quality) to 5 (high quality)), understandability and actionability (Patient Education Materials Assessment Tool (≥70%: understandable, ≥70%: actionable)), accuracy (Likert misinformation; 1 (no misinformation) to 5 (high misinformation) and readability (SMOG, Flesch-Kincaid Grade Level (FKGL); grade level <7: highly readable).</p><p><strong>Results: </strong>All LLM-generated responses were of high-quality (median DISCERN ≥4), understandability (≥70%), and accuracy (Likert=1). All LLM-generated responses were not actionable (<70%). ChatGPT-3.5 and ChatGPT-4 prompt B responses were more readable than prompt A responses (p<0.001). ChatGPT-4 generated more readable responses (lower SMOG and FKGL scores; 5.59±0.5 and 4.31±0.7, respectively) than the other two LLMs (p<0.001) and consistently rewrote them to or below the specified sixth-grade reading level (SMOG: 5.14±0.3).</p><p><strong>Conclusion: </strong>LLMs, particularly ChatGPT-4, proved valuable in generating high-quality, readable, accurate PEMs and in improving the readability of existing materials on paediatric cataract.</p>","PeriodicalId":9313,"journal":{"name":"British Journal of Ophthalmology","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language models: a new frontier in paediatric cataract patient education.\",\"authors\":\"Qais Dihan, Muhammad Z Chauhan, Taher K Eleiwa, Andrew D Brown, Amr K Hassan, Mohamed M Khodeiry, Reem H Elsheikh, Isdin Oke, Bharti R Nihalani, Deborah K VanderVeen, Ahmed B Sallam, Abdelrahman M Elhusseiny\",\"doi\":\"10.1136/bjo-2024-325252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background/aims: </strong>This was a cross-sectional comparative study. We evaluated the ability of three large language models (LLMs) (ChatGPT-3.5, ChatGPT-4, and Google Bard) to generate novel patient education materials (PEMs) and improve the readability of existing PEMs on paediatric cataract.</p><p><strong>Methods: </strong>We compared LLMs' responses to three prompts. Prompt A requested they write a handout on paediatric cataract that was 'easily understandable by an average American.' Prompt B modified prompt A and requested the handout be written at a 'sixth-grade reading level, using the Simple Measure of Gobbledygook (SMOG) readability formula.' Prompt C rewrote existing PEMs on paediatric cataract 'to a sixth-grade reading level using the SMOG readability formula'. Responses were compared on their quality (DISCERN; 1 (low quality) to 5 (high quality)), understandability and actionability (Patient Education Materials Assessment Tool (≥70%: understandable, ≥70%: actionable)), accuracy (Likert misinformation; 1 (no misinformation) to 5 (high misinformation) and readability (SMOG, Flesch-Kincaid Grade Level (FKGL); grade level <7: highly readable).</p><p><strong>Results: </strong>All LLM-generated responses were of high-quality (median DISCERN ≥4), understandability (≥70%), and accuracy (Likert=1). All LLM-generated responses were not actionable (<70%). ChatGPT-3.5 and ChatGPT-4 prompt B responses were more readable than prompt A responses (p<0.001). ChatGPT-4 generated more readable responses (lower SMOG and FKGL scores; 5.59±0.5 and 4.31±0.7, respectively) than the other two LLMs (p<0.001) and consistently rewrote them to or below the specified sixth-grade reading level (SMOG: 5.14±0.3).</p><p><strong>Conclusion: </strong>LLMs, particularly ChatGPT-4, proved valuable in generating high-quality, readable, accurate PEMs and in improving the readability of existing materials on paediatric cataract.</p>\",\"PeriodicalId\":9313,\"journal\":{\"name\":\"British Journal of Ophthalmology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"British Journal of Ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1136/bjo-2024-325252\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bjo-2024-325252","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
背景/目的:这是一项横向比较研究。我们评估了三种大型语言模型(LLMs)(ChatGPT-3.5、ChatGPT-4 和 Google Bard)生成新的患者教育材料(PEMs)和提高现有儿童白内障患者教育材料可读性的能力:我们比较了法律硕士对三个提示的回答。提示 A 要求他们撰写一份 "普通美国人容易理解 "的儿童白内障讲义。提示 B 对提示 A 进行了修改,要求讲义 "按照六年级的阅读水平编写,并使用'胡言乱语的简单衡量'(SMOG)可读性公式"。提示 C 将现有的儿科白内障 PEM "使用 SMOG 可读性公式 "改写成六年级的阅读水平。比较了答复的质量(DISCERN;1(低质量)至 5(高质量))、可理解性和可操作性(患者教育材料评估工具(≥70%:可理解,≥70%:可操作))、准确性(Likert 错误信息;1(无错误信息)至 5(高错误信息))和可读性(SMOG,Flesch-Kincaid 年级水平(FKGL);年级水平 结果:所有由 LLM 生成的回答都具有高质量(DISCERN 中位数≥4)、可理解性(≥70%)和准确性(Likert=1)。所有由 LLM 生成的回复都不具有可操作性(结论:事实证明,LLM(尤其是 ChatGPT-4)在生成高质量、可读性强、准确的 PEM 以及提高现有儿童白内障资料的可读性方面很有价值。
Large language models: a new frontier in paediatric cataract patient education.
Background/aims: This was a cross-sectional comparative study. We evaluated the ability of three large language models (LLMs) (ChatGPT-3.5, ChatGPT-4, and Google Bard) to generate novel patient education materials (PEMs) and improve the readability of existing PEMs on paediatric cataract.
Methods: We compared LLMs' responses to three prompts. Prompt A requested they write a handout on paediatric cataract that was 'easily understandable by an average American.' Prompt B modified prompt A and requested the handout be written at a 'sixth-grade reading level, using the Simple Measure of Gobbledygook (SMOG) readability formula.' Prompt C rewrote existing PEMs on paediatric cataract 'to a sixth-grade reading level using the SMOG readability formula'. Responses were compared on their quality (DISCERN; 1 (low quality) to 5 (high quality)), understandability and actionability (Patient Education Materials Assessment Tool (≥70%: understandable, ≥70%: actionable)), accuracy (Likert misinformation; 1 (no misinformation) to 5 (high misinformation) and readability (SMOG, Flesch-Kincaid Grade Level (FKGL); grade level <7: highly readable).
Results: All LLM-generated responses were of high-quality (median DISCERN ≥4), understandability (≥70%), and accuracy (Likert=1). All LLM-generated responses were not actionable (<70%). ChatGPT-3.5 and ChatGPT-4 prompt B responses were more readable than prompt A responses (p<0.001). ChatGPT-4 generated more readable responses (lower SMOG and FKGL scores; 5.59±0.5 and 4.31±0.7, respectively) than the other two LLMs (p<0.001) and consistently rewrote them to or below the specified sixth-grade reading level (SMOG: 5.14±0.3).
Conclusion: LLMs, particularly ChatGPT-4, proved valuable in generating high-quality, readable, accurate PEMs and in improving the readability of existing materials on paediatric cataract.
期刊介绍:
The British Journal of Ophthalmology (BJO) is an international peer-reviewed journal for ophthalmologists and visual science specialists. BJO publishes clinical investigations, clinical observations, and clinically relevant laboratory investigations related to ophthalmology. It also provides major reviews and also publishes manuscripts covering regional issues in a global context.