ChatGPT as a patient education tool in colorectal cancer—An in-depth assessment of efficacy, quality and readability

IF 2.9 3区医学 Q2 GASTROENTEROLOGY & HEPATOLOGY Colorectal Disease Pub Date : 2024-12-17 DOI:10.1111/codi.17267

Adrian H. Y. Siu, Damien P. Gibson, Chris Chiu, Allan Kwok, Matt Irwin, Adam Christie, Cherry E. Koh, Anil Keshava, Mifanwy Reece, Michael Suen, Matthew J. F. X. Rickard

{"title":"ChatGPT as a patient education tool in colorectal cancer—An in-depth assessment of efficacy, quality and readability","authors":"Adrian H. Y. Siu, Damien P. Gibson, Chris Chiu, Allan Kwok, Matt Irwin, Adam Christie, Cherry E. Koh, Anil Keshava, Mifanwy Reece, Michael Suen, Matthew J. F. X. Rickard","doi":"10.1111/codi.17267","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Aim</h3>\n \n <p>Artificial intelligence (AI) chatbots such as Chat Generative Pretrained Transformer-4 (ChatGPT-4) have made significant strides in generating human-like responses. Trained on an extensive corpus of medical literature, ChatGPT-4 has the potential to augment patient education materials. These chatbots may be beneficial to populations considering a diagnosis of colorectal cancer (CRC). However, the accuracy and quality of patient education materials are crucial for informed decision-making. Given workforce demands impacting holistic care, AI chatbots can bridge gaps in CRC information, reaching wider demographics and crossing language barriers. However, rigorous evaluation is essential to ensure accuracy, quality and readability. Therefore, this study aims to evaluate the efficacy, quality and readability of answers generated by ChatGPT-4 on CRC, utilizing patient-style question prompts.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>To evaluate ChatGPT-4, eight CRC-related questions were derived using peer-reviewed literature and Google Trends. Eight colorectal surgeons evaluated AI responses for accuracy, safety, appropriateness, actionability and effectiveness. Quality was assessed using validated tools: the Patient Education Materials Assessment Tool (PEMAT-AI), modified DISCERN (DISCERN-AI) and Global Quality Score (GQS). A number of readability assessments were measured including Flesch Reading Ease (FRE) and the Gunning Fog Index (GFI).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The responses were generally accurate (median 4.00), safe (4.25), appropriate (4.00), actionable (4.00) and effective (4.00). Quality assessments rated PEMAT-AI as ‘very good’ (71.43), DISCERN-AI as ‘fair’ (12.00) and GQS as ‘high’ (4.00). Readability scores indicated difficulty (FRE 47.00, GFI 12.40), suggesting a higher educational level was required.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>This study concludes that ChatGPT-4 is capable of providing safe but nonspecific medical information, suggesting its potential as a patient education aid. However, enhancements in readability through contextual prompting and fine-tuning techniques are required before considering implementation into clinical practice.</p>\n </section>\n </div>","PeriodicalId":10512,"journal":{"name":"Colorectal Disease","volume":"27 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Colorectal Disease","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/codi.17267","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Aim

Artificial intelligence (AI) chatbots such as Chat Generative Pretrained Transformer-4 (ChatGPT-4) have made significant strides in generating human-like responses. Trained on an extensive corpus of medical literature, ChatGPT-4 has the potential to augment patient education materials. These chatbots may be beneficial to populations considering a diagnosis of colorectal cancer (CRC). However, the accuracy and quality of patient education materials are crucial for informed decision-making. Given workforce demands impacting holistic care, AI chatbots can bridge gaps in CRC information, reaching wider demographics and crossing language barriers. However, rigorous evaluation is essential to ensure accuracy, quality and readability. Therefore, this study aims to evaluate the efficacy, quality and readability of answers generated by ChatGPT-4 on CRC, utilizing patient-style question prompts.

Method

To evaluate ChatGPT-4, eight CRC-related questions were derived using peer-reviewed literature and Google Trends. Eight colorectal surgeons evaluated AI responses for accuracy, safety, appropriateness, actionability and effectiveness. Quality was assessed using validated tools: the Patient Education Materials Assessment Tool (PEMAT-AI), modified DISCERN (DISCERN-AI) and Global Quality Score (GQS). A number of readability assessments were measured including Flesch Reading Ease (FRE) and the Gunning Fog Index (GFI).

Results

The responses were generally accurate (median 4.00), safe (4.25), appropriate (4.00), actionable (4.00) and effective (4.00). Quality assessments rated PEMAT-AI as ‘very good’ (71.43), DISCERN-AI as ‘fair’ (12.00) and GQS as ‘high’ (4.00). Readability scores indicated difficulty (FRE 47.00, GFI 12.40), suggesting a higher educational level was required.

Conclusion

This study concludes that ChatGPT-4 is capable of providing safe but nonspecific medical information, suggesting its potential as a patient education aid. However, enhancements in readability through contextual prompting and fine-tuning techniques are required before considering implementation into clinical practice.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ChatGPT作为结直肠癌患者教育工具的疗效、质量和可读性的深入评估

目的：人工智能（AI）聊天机器人，如聊天生成预训练变形金刚4 (ChatGPT-4)，在生成类似人类的反应方面取得了重大进展。在广泛的医学文献语料库上训练，ChatGPT-4具有增加患者教育材料的潜力。这些聊天机器人可能对考虑诊断结直肠癌（CRC）的人群有益。然而，患者教育材料的准确性和质量对知情决策至关重要。考虑到影响整体护理的劳动力需求，人工智能聊天机器人可以弥合CRC信息的差距，覆盖更广泛的人口统计数据并跨越语言障碍。然而，严格的评估是必不可少的，以确保准确性，质量和可读性。因此，本研究旨在利用患者式问题提示，评估ChatGPT-4生成的CRC答案的有效性、质量和可读性。方法：利用同行评议文献和谷歌Trends得出8个crc相关问题，对ChatGPT-4进行评估。8位结直肠外科医生对人工智能反应的准确性、安全性、适宜性、可操作性和有效性进行了评估。使用经过验证的工具进行质量评估：患者教育材料评估工具（PEMAT-AI）、改良的DISCERN （DISCERN- ai）和全球质量评分（GQS）。测量了一些可读性评估，包括Flesch Reading Ease （FRE）和Gunning Fog Index （GFI）。结果：反应总体准确（中位数4.00）、安全（中位数4.25）、适宜（中位数4.00）、可操作（中位数4.00）、有效（中位数4.00）。质量评估将PEMAT-AI评为“非常好”（71.43），分辨力- ai为“一般”（12.00），GQS为“高”（4.00）。可读性分数表示难度（FRE 47.00, GFI 12.40），表明需要较高的教育水平。结论：本研究表明，ChatGPT-4能够提供安全但非特异性的医学信息，表明其作为患者教育辅助工具的潜力。然而，在考虑实施到临床实践之前，需要通过上下文提示和微调技术来增强可读性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Colorectal Disease 医学-胃肠肝病学

CiteScore

6.10

自引率

11.80%

发文量

406

审稿时长

1.5 months

期刊介绍： Diseases of the colon and rectum are common and offer a number of exciting challenges. Clinical, diagnostic and basic science research is expanding rapidly. There is increasing demand from purchasers of health care and patients for clinicians to keep abreast of the latest research and developments, and to translate these into routine practice. Technological advances in diagnosis, surgical technique, new pharmaceuticals, molecular genetics and other basic sciences have transformed many aspects of how these diseases are managed. Such progress will accelerate. Colorectal Disease offers a real benefit to subscribers and authors. It is first and foremost a vehicle for publishing original research relating to the demanding, rapidly expanding field of colorectal diseases. Essential for surgeons, pathologists, oncologists, gastroenterologists and health professionals caring for patients with a disease of the lower GI tract, Colorectal Disease furthers education and inter-professional development by including regular review articles and discussions of current controversies. Note that the journal does not usually accept paediatric surgical papers.