The Ability of Large Language Models to Generate Patient Information Materials for Retinopathy of Prematurity: Evaluation of Readability, Accuracy, and Comprehensiveness.

Q3 Medicine Turkish Journal of Ophthalmology Pub Date : 2024-12-31 DOI:10.4274/tjo.galenos.2024.58295

Sevinç Arzu Postacı, Ali Dal

{"title":"The Ability of Large Language Models to Generate Patient Information Materials for Retinopathy of Prematurity: Evaluation of Readability, Accuracy, and Comprehensiveness.","authors":"Sevinç Arzu Postacı, Ali Dal","doi":"10.4274/tjo.galenos.2024.58295","DOIUrl":null,"url":null,"abstract":"Objectives: This study compared the readability of patient education materials from the Turkish Ophthalmological Association (TOA) retinopathy of prematurity (ROP) guidelines with those generated by large language models (LLMs). The ability of GPT-4.0, GPT-4o mini, and Gemini to produce patient education materials was evaluated in terms of accuracy and comprehensiveness.Materials and methods: Thirty questions from the TOA ROP guidelines were posed to GPT-4.0, GPT-4o mini, and Gemini. Their responses were then reformulated using the prompts \"Can you revise this text to be understandable at a 6th-grade reading level?\" (P1 format) and \"Can you make this text easier to understand?\" (P2 format). The readability of the TOA ROP guidelines and the LLM-generated responses was analyzed using the Ateşman and Bezirci-Yılmaz formulas. Additionally, ROP specialists evaluated the comprehensiveness and accuracy of the responses.Results: The TOA brochure was found to have a reading level above the 6th-grade level recommended in the literature. Materials generated by GPT-4.0 and Gemini had significantly greater readability than the TOA brochure (p<0.05). Adjustments made in the P1 and P2 formats improved readability for GPT-4.0, while no significant change was observed for GPT-4o mini and Gemini. GPT-4.0 had the highest scores for accuracy and comprehensiveness, while Gemini had the lowest.Conclusion: GPT-4.0 appeared to have greater potential for generating more readable, accurate, and comprehensive patient education materials. However, when integrating LLMs into the healthcare field, regional medical differences and the accuracy of the provided information must be carefully assessed.","PeriodicalId":23373,"journal":{"name":"Turkish Journal of Ophthalmology","volume":"54 6","pages":"330-336"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11707455/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish Journal of Ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4274/tjo.galenos.2024.58295","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: This study compared the readability of patient education materials from the Turkish Ophthalmological Association (TOA) retinopathy of prematurity (ROP) guidelines with those generated by large language models (LLMs). The ability of GPT-4.0, GPT-4o mini, and Gemini to produce patient education materials was evaluated in terms of accuracy and comprehensiveness.

Materials and methods: Thirty questions from the TOA ROP guidelines were posed to GPT-4.0, GPT-4o mini, and Gemini. Their responses were then reformulated using the prompts "Can you revise this text to be understandable at a 6^th-grade reading level?" (P1 format) and "Can you make this text easier to understand?" (P2 format). The readability of the TOA ROP guidelines and the LLM-generated responses was analyzed using the Ateşman and Bezirci-Yılmaz formulas. Additionally, ROP specialists evaluated the comprehensiveness and accuracy of the responses.

Results: The TOA brochure was found to have a reading level above the 6^th-grade level recommended in the literature. Materials generated by GPT-4.0 and Gemini had significantly greater readability than the TOA brochure (p<0.05). Adjustments made in the P1 and P2 formats improved readability for GPT-4.0, while no significant change was observed for GPT-4o mini and Gemini. GPT-4.0 had the highest scores for accuracy and comprehensiveness, while Gemini had the lowest.

Conclusion: GPT-4.0 appeared to have greater potential for generating more readable, accurate, and comprehensive patient education materials. However, when integrating LLMs into the healthcare field, regional medical differences and the accuracy of the provided information must be carefully assessed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大型语言模型生成早产儿视网膜病变患者信息材料的能力：可读性、准确性和全面性的评估。

目的：本研究比较了土耳其眼科协会（TOA）早产儿视网膜病变（ROP）指南中患者教育材料与大型语言模型（LLMs）生成的患者教育材料的可读性。对GPT-4.0、gpt - 40mini和Gemini制作患者教育材料的能力进行准确性和全面性评估。材料和方法：从TOA ROP指南中向GPT-4.0、gpt - 40mini和Gemini提出30个问题。然后，他们的回答用“你能把这篇文章修改成六年级的阅读水平吗？”（P1格式）和“你能让这篇文章更容易理解吗？”(P2格式)。使用ate和Bezirci-Yılmaz公式分析TOA ROP指南和llm生成的响应的可读性。此外，ROP专家评估了响应的全面性和准确性。结果：TOA手册的阅读水平高于文献中推荐的六年级阅读水平。GPT-4.0和Gemini生成的材料比TOA手册具有更高的可读性(p结论：GPT-4.0似乎具有更大的潜力生成更可读、准确和全面的患者教育材料。然而，在将法学硕士纳入医疗保健领域时，必须仔细评估地区医疗差异和所提供信息的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Turkish Journal of Ophthalmology Medicine-Ophthalmology

CiteScore

2.20

自引率

0.00%

发文量

期刊介绍： The Turkish Journal of Ophthalmology (TJO) is the only scientific periodical publication of the Turkish Ophthalmological Association and has been published since January 1929. In its early years, the journal was published in Turkish and French. Although there were temporary interruptions in the publication of the journal due to various challenges, the Turkish Journal of Ophthalmology has been published continually from 1971 to the present. The target audience includes specialists and physicians in training in ophthalmology in all relevant disciplines.