人工智能大型语言模型 ChatGPT 对心脏成像问题回答的评估

IF 1.8 4区 医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Clinical Imaging Pub Date : 2024-05-23 DOI:10.1016/j.clinimag.2024.110193
Cynthia L. Monroe , Yasser G. Abdelhafez , Kwame Atsina , Edris Aman , Lorenzo Nardo , Mohammad H. Madani
{"title":"人工智能大型语言模型 ChatGPT 对心脏成像问题回答的评估","authors":"Cynthia L. Monroe ,&nbsp;Yasser G. Abdelhafez ,&nbsp;Kwame Atsina ,&nbsp;Edris Aman ,&nbsp;Lorenzo Nardo ,&nbsp;Mohammad H. Madani","doi":"10.1016/j.clinimag.2024.110193","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management.</p></div><div><h3>Methods</h3><p>30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. Responses were scored as correct, incorrect, or clinically misleading categories by three observers—two board certified cardiologists and one board certified radiologist with cardiac imaging subspecialization. Consistency of responses across the three sessions was also evaluated. Final categorization was based on majority vote between at least two of the three observers.</p></div><div><h3>Results</h3><p>ChatGPT-3.5 answered seventeen of twenty eight questions correctly (61 %) by majority vote. Twenty one of twenty eight questions were answered correctly (75 %) by ChatGPT-4 by majority vote. Majority vote for correctness was not achieved for two questions. Twenty six of thirty questions were answered consistently by ChatGPT-3.5 (87 %). Twenty nine of thirty questions were answered consistently by ChatGPT-4 (97 %). ChatGPT-3.5 had both consistent and correct responses to seventeen of twenty eight questions (61 %). ChatGPT-4 had both consistent and correct responses to twenty of twenty eight questions (71 %).</p></div><div><h3>Conclusion</h3><p>ChatGPT-4 had overall better performance than ChatGTP-3.5 when answering cardiac imaging questions with regard to correctness and consistency of responses. While both ChatGPT-3.5 and ChatGPT-4 answers over half of cardiac imaging questions correctly, inaccurate, clinically misleading and inconsistent responses suggest the need for further refinement before its application for educating patients about cardiac imaging.</p></div>","PeriodicalId":50680,"journal":{"name":"Clinical Imaging","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0899707124001232/pdfft?md5=1511764978ad04d7187878b982fe9d58&pid=1-s2.0-S0899707124001232-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT\",\"authors\":\"Cynthia L. Monroe ,&nbsp;Yasser G. Abdelhafez ,&nbsp;Kwame Atsina ,&nbsp;Edris Aman ,&nbsp;Lorenzo Nardo ,&nbsp;Mohammad H. Madani\",\"doi\":\"10.1016/j.clinimag.2024.110193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><p>To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management.</p></div><div><h3>Methods</h3><p>30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. Responses were scored as correct, incorrect, or clinically misleading categories by three observers—two board certified cardiologists and one board certified radiologist with cardiac imaging subspecialization. Consistency of responses across the three sessions was also evaluated. Final categorization was based on majority vote between at least two of the three observers.</p></div><div><h3>Results</h3><p>ChatGPT-3.5 answered seventeen of twenty eight questions correctly (61 %) by majority vote. Twenty one of twenty eight questions were answered correctly (75 %) by ChatGPT-4 by majority vote. Majority vote for correctness was not achieved for two questions. Twenty six of thirty questions were answered consistently by ChatGPT-3.5 (87 %). Twenty nine of thirty questions were answered consistently by ChatGPT-4 (97 %). ChatGPT-3.5 had both consistent and correct responses to seventeen of twenty eight questions (61 %). ChatGPT-4 had both consistent and correct responses to twenty of twenty eight questions (71 %).</p></div><div><h3>Conclusion</h3><p>ChatGPT-4 had overall better performance than ChatGTP-3.5 when answering cardiac imaging questions with regard to correctness and consistency of responses. While both ChatGPT-3.5 and ChatGPT-4 answers over half of cardiac imaging questions correctly, inaccurate, clinically misleading and inconsistent responses suggest the need for further refinement before its application for educating patients about cardiac imaging.</p></div>\",\"PeriodicalId\":50680,\"journal\":{\"name\":\"Clinical Imaging\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0899707124001232/pdfft?md5=1511764978ad04d7187878b982fe9d58&pid=1-s2.0-S0899707124001232-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0899707124001232\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Imaging","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0899707124001232","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

目的 评估 ChatGPT 作为向患者讲解心脏成像各方面知识的资源的能力,包括诊断、成像方式、适应症、放射学报告的解释和管理。方法 在三个独立的聊天会话中,向 ChatGPT-3.5 和 ChatGPT-4 提出了 30 个问题。回答分为正确、错误或临床误导三类,由三名观察员进行评分,其中两名是获得医学会认证的心脏病专家,一名是获得医学会认证的放射科专家,他们都拥有心脏成像亚专业资质。此外,还对三个环节中回答的一致性进行了评估。结果 ChatGPT-3.5 以多数票正确回答了二十八个问题中的十七个(61%)。ChatGPT-4 以多数票正确回答了二十八个问题中的二十一个(75%)。有两个问题的正确率没有达到多数票。在 ChatGPT-3.5 中,30 个问题中有 26 个问题得到了一致回答(87%)。ChatGPT-4 对 30 个问题中的 29 个问题进行了一致回答(97%)。ChatGPT-3.5 对 28 个问题中的 17 个问题(61%)的回答既一致又正确。结论在回答心脏成像问题时,ChatGPT-4 在回答的正确性和一致性方面的整体表现优于 ChatGTP-3.5。虽然 ChatGPT-3.5 和 ChatGPT-4 都能正确回答一半以上的心脏成像问题,但不准确、误导临床和不一致的回答表明,在应用 ChatGPT-4 对患者进行心脏成像教育之前,有必要对其进行进一步改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT

Purpose

To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management.

Methods

30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. Responses were scored as correct, incorrect, or clinically misleading categories by three observers—two board certified cardiologists and one board certified radiologist with cardiac imaging subspecialization. Consistency of responses across the three sessions was also evaluated. Final categorization was based on majority vote between at least two of the three observers.

Results

ChatGPT-3.5 answered seventeen of twenty eight questions correctly (61 %) by majority vote. Twenty one of twenty eight questions were answered correctly (75 %) by ChatGPT-4 by majority vote. Majority vote for correctness was not achieved for two questions. Twenty six of thirty questions were answered consistently by ChatGPT-3.5 (87 %). Twenty nine of thirty questions were answered consistently by ChatGPT-4 (97 %). ChatGPT-3.5 had both consistent and correct responses to seventeen of twenty eight questions (61 %). ChatGPT-4 had both consistent and correct responses to twenty of twenty eight questions (71 %).

Conclusion

ChatGPT-4 had overall better performance than ChatGTP-3.5 when answering cardiac imaging questions with regard to correctness and consistency of responses. While both ChatGPT-3.5 and ChatGPT-4 answers over half of cardiac imaging questions correctly, inaccurate, clinically misleading and inconsistent responses suggest the need for further refinement before its application for educating patients about cardiac imaging.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Clinical Imaging
Clinical Imaging 医学-核医学
CiteScore
4.60
自引率
0.00%
发文量
265
审稿时长
35 days
期刊介绍: The mission of Clinical Imaging is to publish, in a timely manner, the very best radiology research from the United States and around the world with special attention to the impact of medical imaging on patient care. The journal''s publications cover all imaging modalities, radiology issues related to patients, policy and practice improvements, and clinically-oriented imaging physics and informatics. The journal is a valuable resource for practicing radiologists, radiologists-in-training and other clinicians with an interest in imaging. Papers are carefully peer-reviewed and selected by our experienced subject editors who are leading experts spanning the range of imaging sub-specialties, which include: -Body Imaging- Breast Imaging- Cardiothoracic Imaging- Imaging Physics and Informatics- Molecular Imaging and Nuclear Medicine- Musculoskeletal and Emergency Imaging- Neuroradiology- Practice, Policy & Education- Pediatric Imaging- Vascular and Interventional Radiology
期刊最新文献
Women in Radiology Education (WIRED): An actionable step towards closing the gender gap in radiology. Contents Heart lung axis in acute pulmonary embolism: Role of CT in risk stratification Clinical experience on the limited role of ultrasound for breast cancer screening in BRCA1 and BRCA2 mutations carriers aged 30–39 years Factors affecting mammogram breast cancer surveillance effectiveness in the ipsilateral and contralateral breast
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1