Hossam A Zaki, Michelle Mai, Hazem Abdel-Megid, Sabrina Q R Liew, Simon Kidanemariam, Abdifatah S Omar, Urvi Tiwari, Jad Hamze, Sun Ho Ahn, Aaron W P Maxwell
{"title":"Using ChatGPT to Improve Readability of Interventional Radiology Procedure Descriptions.","authors":"Hossam A Zaki, Michelle Mai, Hazem Abdel-Megid, Sabrina Q R Liew, Simon Kidanemariam, Abdifatah S Omar, Urvi Tiwari, Jad Hamze, Sun Ho Ahn, Aaron W P Maxwell","doi":"10.1007/s00270-024-03803-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This project examines ChatGPT's potential to enhance the readability of patient educational materials about interventional radiology (IR) procedures.</p><p><strong>Methods and materials: </strong>The descriptions of IR procedures from the Cardiovascular and Interventional Radiological Society of Europe (CIRSE) were used as the original text. Readability scores were calculated using three metrics: Flesch Reading Ease (FRE), Gunning Fog (GF), and the Automated Readability Index (ARI) using an online calculator ( https://readabilityformulas.com ). FRE is scored on a scale of 0-100, where 100 indicates easy-to-read texts, and GF and ARI represent the grade level required to comprehend the text. The DISCERN instrument measured credibility and reliability. ChatGPT was prompted to simplify the texts to a fifth-grade reading level, with subsequent recalculation of readability and DISCERN scores for comparison. Statistical significance was determined using a Wilcoxon Signed-Rank Test. Articles were subsequently organized by subgroups and analyzed.</p><p><strong>Results: </strong>73 interventional radiology procedures from CIRSE were analyzed. The original FRE score was 47.2 (Difficult), improved to 78.4 (Fairly Easy) by ChatGPT. GF and ARI scores dropped from 14.4 and 11.2 to 7.8 and 5.8, respectively, after simplification, showing significant improvement (p < 0.001). However, the average DISCERN score decreased from 3.73 to 2.99 (p < 0.001) post-ChatGPT simplification.</p><p><strong>Conclusion: </strong>This study shows ChatGPT's ability to make interventional radiology descriptions more readable but highlights its struggle to maintain the original's reliability, suggesting the need for human review and prompt engineering to enhance outcomes.</p><p><strong>Level of evidence: </strong>Level 6.</p>","PeriodicalId":9591,"journal":{"name":"CardioVascular and Interventional Radiology","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CardioVascular and Interventional Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00270-024-03803-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/9 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: This project examines ChatGPT's potential to enhance the readability of patient educational materials about interventional radiology (IR) procedures.
Methods and materials: The descriptions of IR procedures from the Cardiovascular and Interventional Radiological Society of Europe (CIRSE) were used as the original text. Readability scores were calculated using three metrics: Flesch Reading Ease (FRE), Gunning Fog (GF), and the Automated Readability Index (ARI) using an online calculator ( https://readabilityformulas.com ). FRE is scored on a scale of 0-100, where 100 indicates easy-to-read texts, and GF and ARI represent the grade level required to comprehend the text. The DISCERN instrument measured credibility and reliability. ChatGPT was prompted to simplify the texts to a fifth-grade reading level, with subsequent recalculation of readability and DISCERN scores for comparison. Statistical significance was determined using a Wilcoxon Signed-Rank Test. Articles were subsequently organized by subgroups and analyzed.
Results: 73 interventional radiology procedures from CIRSE were analyzed. The original FRE score was 47.2 (Difficult), improved to 78.4 (Fairly Easy) by ChatGPT. GF and ARI scores dropped from 14.4 and 11.2 to 7.8 and 5.8, respectively, after simplification, showing significant improvement (p < 0.001). However, the average DISCERN score decreased from 3.73 to 2.99 (p < 0.001) post-ChatGPT simplification.
Conclusion: This study shows ChatGPT's ability to make interventional radiology descriptions more readable but highlights its struggle to maintain the original's reliability, suggesting the need for human review and prompt engineering to enhance outcomes.
期刊介绍:
CardioVascular and Interventional Radiology (CVIR) is the official journal of the Cardiovascular and Interventional Radiological Society of Europe, and is also the official organ of a number of additional distinguished national and international interventional radiological societies. CVIR publishes double blinded peer-reviewed original research work including clinical and laboratory investigations, technical notes, case reports, works in progress, and letters to the editor, as well as review articles, pictorial essays, editorials, and special invited submissions in the field of vascular and interventional radiology. Beside the communication of the latest research results in this field, it is also the aim of CVIR to support continuous medical education. Articles that are accepted for publication are done so with the understanding that they, or their substantive contents, have not been and will not be submitted to any other publication.