A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.

Q2 Medicine JMIR Cardio Pub Date : 2024-04-19 DOI:10.2196/53421
Ryan C. King, Jamil S. Samaan, Yee Hui Yeo, Yuxin Peng, David C Kunkel, Ali A. Habib, Roxana Ghashghaei
{"title":"A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.","authors":"Ryan C. King, Jamil S. Samaan, Yee Hui Yeo, Yuxin Peng, David C Kunkel, Ali A. Habib, Roxana Ghashghaei","doi":"10.2196/53421","DOIUrl":null,"url":null,"abstract":"BACKGROUND\nAmyloidosis, a rare multisystem condition, often requires complex, multidisciplinary care. Its low prevalence underscores the importance of efforts to ensure the availability of high-quality patient education materials for better outcomes. ChatGPT (OpenAI) is a large language model powered by artificial intelligence that offers a potential avenue for disseminating accurate, reliable, and accessible educational resources for both patients and providers. Its user-friendly interface, engaging conversational responses, and the capability for users to ask follow-up questions make it a promising future tool in delivering accurate and tailored information to patients.\n\n\nOBJECTIVE\nWe performed a multidisciplinary assessment of the accuracy, reproducibility, and readability of ChatGPT in answering questions related to amyloidosis.\n\n\nMETHODS\nIn total, 98 amyloidosis questions related to cardiology, gastroenterology, and neurology were curated from medical societies, institutions, and amyloidosis Facebook support groups and inputted into ChatGPT-3.5 and ChatGPT-4. Cardiology- and gastroenterology-related responses were independently graded by a board-certified cardiologist and gastroenterologist, respectively, who specialize in amyloidosis. These 2 reviewers (RG and DCK) also graded general questions for which disagreements were resolved with discussion. Neurology-related responses were graded by a board-certified neurologist (AAH) who specializes in amyloidosis. Reviewers used the following grading scale: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect. Questions were stratified by categories for further analysis. Reproducibility was assessed by inputting each question twice into each model. The readability of ChatGPT-4 responses was also evaluated using the Textstat library in Python (Python Software Foundation) and the Textstat readability package in R software (R Foundation for Statistical Computing).\n\n\nRESULTS\nChatGPT-4 (n=98) provided 93 (95%) responses with accurate information, and 82 (84%) were comprehensive. ChatGPT-3.5 (n=83) provided 74 (89%) responses with accurate information, and 66 (79%) were comprehensive. When examined by question category, ChatGTP-4 and ChatGPT-3.5 provided 53 (95%) and 48 (86%) comprehensive responses, respectively, to \"general questions\" (n=56). When examined by subject, ChatGPT-4 and ChatGPT-3.5 performed best in response to cardiology questions (n=12) with both models producing 10 (83%) comprehensive responses. For gastroenterology (n=15), ChatGPT-4 received comprehensive grades for 9 (60%) responses, and ChatGPT-3.5 provided 8 (53%) responses. Overall, 96 of 98 (98%) responses for ChatGPT-4 and 73 of 83 (88%) for ChatGPT-3.5 were reproducible. The readability of ChatGPT-4's responses ranged from 10th to beyond graduate US grade levels with an average of 15.5 (SD 1.9).\n\n\nCONCLUSIONS\nLarge language models are a promising tool for accurate and reliable health information for patients living with amyloidosis. However, ChatGPT's responses exceeded the American Medical Association's recommended fifth- to sixth-grade reading level. Future studies focusing on improving response accuracy and readability are warranted. Prior to widespread implementation, the technology's limitations and ethical implications must be further explored to ensure patient safety and equitable implementation.","PeriodicalId":14706,"journal":{"name":"JMIR Cardio","volume":" 723","pages":"e53421"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Cardio","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/53421","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

BACKGROUND Amyloidosis, a rare multisystem condition, often requires complex, multidisciplinary care. Its low prevalence underscores the importance of efforts to ensure the availability of high-quality patient education materials for better outcomes. ChatGPT (OpenAI) is a large language model powered by artificial intelligence that offers a potential avenue for disseminating accurate, reliable, and accessible educational resources for both patients and providers. Its user-friendly interface, engaging conversational responses, and the capability for users to ask follow-up questions make it a promising future tool in delivering accurate and tailored information to patients. OBJECTIVE We performed a multidisciplinary assessment of the accuracy, reproducibility, and readability of ChatGPT in answering questions related to amyloidosis. METHODS In total, 98 amyloidosis questions related to cardiology, gastroenterology, and neurology were curated from medical societies, institutions, and amyloidosis Facebook support groups and inputted into ChatGPT-3.5 and ChatGPT-4. Cardiology- and gastroenterology-related responses were independently graded by a board-certified cardiologist and gastroenterologist, respectively, who specialize in amyloidosis. These 2 reviewers (RG and DCK) also graded general questions for which disagreements were resolved with discussion. Neurology-related responses were graded by a board-certified neurologist (AAH) who specializes in amyloidosis. Reviewers used the following grading scale: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect. Questions were stratified by categories for further analysis. Reproducibility was assessed by inputting each question twice into each model. The readability of ChatGPT-4 responses was also evaluated using the Textstat library in Python (Python Software Foundation) and the Textstat readability package in R software (R Foundation for Statistical Computing). RESULTS ChatGPT-4 (n=98) provided 93 (95%) responses with accurate information, and 82 (84%) were comprehensive. ChatGPT-3.5 (n=83) provided 74 (89%) responses with accurate information, and 66 (79%) were comprehensive. When examined by question category, ChatGTP-4 and ChatGPT-3.5 provided 53 (95%) and 48 (86%) comprehensive responses, respectively, to "general questions" (n=56). When examined by subject, ChatGPT-4 and ChatGPT-3.5 performed best in response to cardiology questions (n=12) with both models producing 10 (83%) comprehensive responses. For gastroenterology (n=15), ChatGPT-4 received comprehensive grades for 9 (60%) responses, and ChatGPT-3.5 provided 8 (53%) responses. Overall, 96 of 98 (98%) responses for ChatGPT-4 and 73 of 83 (88%) for ChatGPT-3.5 were reproducible. The readability of ChatGPT-4's responses ranged from 10th to beyond graduate US grade levels with an average of 15.5 (SD 1.9). CONCLUSIONS Large language models are a promising tool for accurate and reliable health information for patients living with amyloidosis. However, ChatGPT's responses exceeded the American Medical Association's recommended fifth- to sixth-grade reading level. Future studies focusing on improving response accuracy and readability are warranted. Prior to widespread implementation, the technology's limitations and ethical implications must be further explored to ensure patient safety and equitable implementation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多学科评估 ChatGPT 对淀粉样变性的了解:观察研究。
背景淀粉样变性是一种罕见的多系统疾病,通常需要复杂的多学科治疗。这种疾病的发病率很低,因此必须努力确保提供高质量的患者教育材料,以取得更好的治疗效果。ChatGPT(OpenAI)是一个由人工智能驱动的大型语言模型,它为向患者和医疗服务提供者传播准确、可靠、易用的教育资源提供了一个潜在的途径。其友好的用户界面、引人入胜的对话式回答以及用户可提出后续问题的功能使其成为向患者提供准确且有针对性信息的一种前景广阔的工具。目的 我们对 ChatGPT 回答淀粉样变性相关问题的准确性、可重复性和可读性进行了多学科评估。方法我们从医学会、医疗机构和淀粉样变性 Facebook 支持小组共收集了 98 个与心脏科、消化科和神经科相关的淀粉样变性问题,并将其输入 ChatGPT-3.5 和 ChatGPT-4。与心脏病学和肠胃病学相关的回复分别由一名淀粉样变性方面的专业认证心脏病学家和肠胃病学家独立评分。这两位审稿人(RG 和 DCK)还对一般性问题进行了评分,对于存在分歧的问题通过讨论解决。与神经病学相关的回答由一位淀粉样变性专业的神经病学委员会认证医师(AAH)进行评分。审稿人采用以下评分标准:(1) 全面,(2) 正确但不充分,(3) 部分正确,部分不正确,(4) 完全不正确。问题按类别进行分层,以便进一步分析。通过将每个问题输入每个模型两次来评估可重复性。此外,还使用 Python(Python 软件基金会)中的 Textstat 库和 R 软件(R 统计计算基金会)中的 Textstat 可读性软件包对 ChatGPT-4 回答的可读性进行了评估。ChatGPT-3.5(n=83)提供了 74(89%)份信息准确的答复,66(79%)份答复内容全面。按问题类别检查时,ChatGTP-4 和 ChatGPT-3.5 对 "一般问题"(n=56)分别提供了 53 (95%) 和 48 (86%) 个全面的回答。如果按学科进行检查,ChatGPT-4 和 ChatGPT-3.5 在回答心脏病学问题(n=12)时表现最佳,两个模型都提供了 10 个(83%)全面的回答。对于胃肠病学(n=15),ChatGPT-4 有 9 个(60%)回答获得了综合评分,ChatGPT-3.5 有 8 个(53%)回答获得了综合评分。总体而言,ChatGPT-4 的 98 个回答中有 96 个(98%)是可重复的,ChatGPT-3.5 的 83 个回答中有 73 个(88%)是可重复的。ChatGPT-4 回答的可读性从美国 10 年级到研究生以上不等,平均为 15.5(SD 1.9)。然而,ChatGPT 的回答超过了美国医学会建议的五至六年级阅读水平。今后的研究应着重提高回复的准确性和可读性。在广泛实施之前,必须进一步探讨该技术的局限性和伦理影响,以确保患者安全和公平实施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
相关文献
Information for Readers
IF 3.2 2区 医学Journal of vascular surgery. Venous and lymphatic disordersPub Date : 2021-03-01 DOI: 10.1016/S2213-333X(21)00007-X
Information for readers
IF 5.9 2区 计算机科学IEEE Transactions on ReliabilityPub Date : 1972-03-01 DOI: 10.1109/TR.1972.5215966
Information for Readers and readers
IF 29.4 1区 医学GastroenterologyPub Date : 2005-11-01 DOI: 10.1053/S0016-5085(05)01809-3
来源期刊
JMIR Cardio
JMIR Cardio Computer Science-Computer Science Applications
CiteScore
3.50
自引率
0.00%
发文量
25
审稿时长
12 weeks
期刊最新文献
Efficiency Improvement of the Clinical Pathway in Cardiac Monitor Insertion and Follow-Up: Retrospective Analysis. Wrist-Worn and Arm-Worn Wearables for Monitoring Heart Rate During Sedentary and Light-to-Vigorous Physical Activities: Device Validation Study. Optimization of the Care4Today Digital Health Platform to Enhance Self-Reporting of Medication Adherence and Health Experiences in Patients With Coronary or Peripheral Artery Disease: Mixed Methods Study. Exploring Stakeholder Perspectives on the Barriers and Facilitators of Implementing Digital Technologies for Heart Disease Diagnosis: Qualitative Study. Predicting Atrial Fibrillation Relapse Using Bayesian Networks: Explainable AI Approach.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1