消化数字健康:ChatGPT 生成的肠胃病信息的适宜性和可读性研究。

IF 3 3区 医学 Q2 GASTROENTEROLOGY & HEPATOLOGY Clinical and Translational Gastroenterology Pub Date : 2024-08-30 DOI:10.14309/ctg.0000000000000765
Avi Toiv, Zachary Saleh, Angela Ishak, Eva Alsheik, Deepak Venkat, Neilanjan Nandi, Tobias E Zuchelli
{"title":"消化数字健康:ChatGPT 生成的肠胃病信息的适宜性和可读性研究。","authors":"Avi Toiv, Zachary Saleh, Angela Ishak, Eva Alsheik, Deepak Venkat, Neilanjan Nandi, Tobias E Zuchelli","doi":"10.14309/ctg.0000000000000765","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The advent of artificial intelligence-powered large language models capable of generating interactive responses to intricate queries marks a groundbreaking development in how patients access medical information. Our aim was to evaluate the appropriateness and readability of gastroenterological information generated by Chat Generative Pretrained Transformer (ChatGPT).</p><p><strong>Methods: </strong>We analyzed responses generated by ChatGPT to 16 dialog-based queries assessing symptoms and treatments for gastrointestinal conditions and 13 definition-based queries on prevalent topics in gastroenterology. Three board-certified gastroenterologists evaluated output appropriateness with a 5-point Likert-scale proxy measurement of currency, relevance, accuracy, comprehensiveness, clarity, and urgency/next steps. Outputs with a score of 4 or 5 in all 6 categories were designated as \"appropriate.\" Output readability was assessed with Flesch Reading Ease score, Flesch-Kinkaid Reading Level, and Simple Measure of Gobbledygook scores.</p><p><strong>Results: </strong>ChatGPT responses to 44% of the 16 dialog-based and 69% of the 13 definition-based questions were deemed appropriate, and the proportion of appropriate responses within the 2 groups of questions was not significantly different ( P = 0.17). Notably, none of ChatGPT's responses to questions related to gastrointestinal emergencies were designated appropriate. The mean readability scores showed that outputs were written at a college-level reading proficiency.</p><p><strong>Discussion: </strong>ChatGPT can produce generally fitting responses to gastroenterological medical queries, but responses were constrained in appropriateness and readability, which limits the current utility of this large language model. Substantial development is essential before these models can be unequivocally endorsed as reliable sources of medical information.</p>","PeriodicalId":10278,"journal":{"name":"Clinical and Translational Gastroenterology","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Digesting Digital Health: A Study of Appropriateness and Readability of ChatGPT-Generated Gastroenterological Information.\",\"authors\":\"Avi Toiv, Zachary Saleh, Angela Ishak, Eva Alsheik, Deepak Venkat, Neilanjan Nandi, Tobias E Zuchelli\",\"doi\":\"10.14309/ctg.0000000000000765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>The advent of artificial intelligence-powered large language models capable of generating interactive responses to intricate queries marks a groundbreaking development in how patients access medical information. Our aim was to evaluate the appropriateness and readability of gastroenterological information generated by Chat Generative Pretrained Transformer (ChatGPT).</p><p><strong>Methods: </strong>We analyzed responses generated by ChatGPT to 16 dialog-based queries assessing symptoms and treatments for gastrointestinal conditions and 13 definition-based queries on prevalent topics in gastroenterology. Three board-certified gastroenterologists evaluated output appropriateness with a 5-point Likert-scale proxy measurement of currency, relevance, accuracy, comprehensiveness, clarity, and urgency/next steps. Outputs with a score of 4 or 5 in all 6 categories were designated as \\\"appropriate.\\\" Output readability was assessed with Flesch Reading Ease score, Flesch-Kinkaid Reading Level, and Simple Measure of Gobbledygook scores.</p><p><strong>Results: </strong>ChatGPT responses to 44% of the 16 dialog-based and 69% of the 13 definition-based questions were deemed appropriate, and the proportion of appropriate responses within the 2 groups of questions was not significantly different ( P = 0.17). Notably, none of ChatGPT's responses to questions related to gastrointestinal emergencies were designated appropriate. The mean readability scores showed that outputs were written at a college-level reading proficiency.</p><p><strong>Discussion: </strong>ChatGPT can produce generally fitting responses to gastroenterological medical queries, but responses were constrained in appropriateness and readability, which limits the current utility of this large language model. Substantial development is essential before these models can be unequivocally endorsed as reliable sources of medical information.</p>\",\"PeriodicalId\":10278,\"journal\":{\"name\":\"Clinical and Translational Gastroenterology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical and Translational Gastroenterology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.14309/ctg.0000000000000765\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GASTROENTEROLOGY & HEPATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Translational Gastroenterology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.14309/ctg.0000000000000765","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景和目的:人工智能驱动的大型语言模型能够对复杂的查询生成交互式回复,它的出现标志着患者获取医疗信息方式的突破性发展。我们的目的是评估由 ChatGPT 生成的肠胃病学信息的适当性和可读性:我们分析了 ChatGPT 生成的 16 个评估胃肠道疾病症状和治疗方法的对话式查询和 13 个胃肠病学流行主题的定义式查询。三位获得认证的胃肠病学专家采用 5 点李克特量表(Likert-scale proxy)来评估输出的适当性,包括时效性、相关性、准确性、全面性、清晰性和紧迫性/下一步措施。在所有 6 个类别中均获得 4 分或 5 分的输出结果被指定为 "适当"。输出的可读性通过 Flesch Reading Ease 分数、Flesch-Kinkaid Reading Level 分数和 Simple Measure of Gobbledygook 分数进行评估:在 16 个基于对话的问题和 13 个基于定义的问题中,分别有 44% 和 69% 的 ChatGTP 回答被认为是恰当的,两组问题中恰当回答的比例没有显著差异(P = .17)。值得注意的是,ChatGTP 对胃肠道急症相关问题的回答没有一个被认为是恰当的。平均可读性得分显示,输出结果是以大学水平的阅读能力撰写的:结论:ChatGPT 可以对肠胃病医学询问做出基本合适的回答,但回答的适当性和可读性受到限制,这限制了这一大型语言模型目前的实用性。在将这些模型明确认可为可靠的医疗信息来源之前,必须进行大量的开发工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Digesting Digital Health: A Study of Appropriateness and Readability of ChatGPT-Generated Gastroenterological Information.

Introduction: The advent of artificial intelligence-powered large language models capable of generating interactive responses to intricate queries marks a groundbreaking development in how patients access medical information. Our aim was to evaluate the appropriateness and readability of gastroenterological information generated by Chat Generative Pretrained Transformer (ChatGPT).

Methods: We analyzed responses generated by ChatGPT to 16 dialog-based queries assessing symptoms and treatments for gastrointestinal conditions and 13 definition-based queries on prevalent topics in gastroenterology. Three board-certified gastroenterologists evaluated output appropriateness with a 5-point Likert-scale proxy measurement of currency, relevance, accuracy, comprehensiveness, clarity, and urgency/next steps. Outputs with a score of 4 or 5 in all 6 categories were designated as "appropriate." Output readability was assessed with Flesch Reading Ease score, Flesch-Kinkaid Reading Level, and Simple Measure of Gobbledygook scores.

Results: ChatGPT responses to 44% of the 16 dialog-based and 69% of the 13 definition-based questions were deemed appropriate, and the proportion of appropriate responses within the 2 groups of questions was not significantly different ( P = 0.17). Notably, none of ChatGPT's responses to questions related to gastrointestinal emergencies were designated appropriate. The mean readability scores showed that outputs were written at a college-level reading proficiency.

Discussion: ChatGPT can produce generally fitting responses to gastroenterological medical queries, but responses were constrained in appropriateness and readability, which limits the current utility of this large language model. Substantial development is essential before these models can be unequivocally endorsed as reliable sources of medical information.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Clinical and Translational Gastroenterology
Clinical and Translational Gastroenterology GASTROENTEROLOGY & HEPATOLOGY-
CiteScore
7.00
自引率
0.00%
发文量
114
审稿时长
16 weeks
期刊介绍: Clinical and Translational Gastroenterology (CTG), published on behalf of the American College of Gastroenterology (ACG), is a peer-reviewed open access online journal dedicated to innovative clinical work in the field of gastroenterology and hepatology. CTG hopes to fulfill an unmet need for clinicians and scientists by welcoming novel cohort studies, early-phase clinical trials, qualitative and quantitative epidemiologic research, hypothesis-generating research, studies of novel mechanisms and methodologies including public health interventions, and integration of approaches across organs and disciplines. CTG also welcomes hypothesis-generating small studies, methods papers, and translational research with clear applications to human physiology or disease. Colon and small bowel Endoscopy and novel diagnostics Esophagus Functional GI disorders Immunology of the GI tract Microbiology of the GI tract Inflammatory bowel disease Pancreas and biliary tract Liver Pathology Pediatrics Preventative medicine Nutrition/obesity Stomach.
期刊最新文献
Development of Time-Aggregated Machine Learning Model for Relapse Prediction in Pediatric Crohn's Disease. Diagnosing Small Intestinal Bacterial Overgrowth. The Effects of Testosterone Replacement Therapy in Adult Men with Metabolic Dysfunction associated Steatotic Liver Disease: A Systematic Review and Meta-analysis. Size of Pelvic Outlet as a Potential Risk Factor for Fecal Incontinence: A Population-Based Exploratory Analysis. Association of Childhood Abuse with Incident Inflammatory Bowel Disease.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1