In the face of confounders: Atrial fibrillation detection – Practitioners vs. ChatGPT

IF 1.3 4区 医学 Q3 CARDIAC & CARDIOVASCULAR SYSTEMS Journal of electrocardiology Pub Date : 2025-01-01 DOI:10.1016/j.jelectrocard.2024.153851
Yuval Avidan MD , Vsevolod Tabachnikov MD , Orel Ben Court MD , Razi Khoury MD , Amir Aker MD
{"title":"In the face of confounders: Atrial fibrillation detection – Practitioners vs. ChatGPT","authors":"Yuval Avidan MD ,&nbsp;Vsevolod Tabachnikov MD ,&nbsp;Orel Ben Court MD ,&nbsp;Razi Khoury MD ,&nbsp;Amir Aker MD","doi":"10.1016/j.jelectrocard.2024.153851","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Atrial fibrillation (AF) is the most common arrhythmia in clinical practice, yet interpretation concerns among healthcare providers persist. Confounding factors contribute to false-positive and false-negative AF diagnoses, leading to potential omissions. Artificial intelligence advancements show promise in electrocardiogram (ECG) interpretation. We sought to examine the diagnostic accuracy of ChatGPT-4omni (GPT-4o), equipped with image evaluation capabilities, in interpreting ECGs with confounding factors and compare its performance to that of physicians.</div></div><div><h3>Methods</h3><div>Twenty ECG cases, divided into Group A (10 cases of AF or atrial flutter) and Group B (10 cases of sinus or another atrial rhythm), were crafted into multiple-choice questions. Total of 100 practitioners (25 from each: emergency medicine, internal medicine, primary care, and cardiology) were tasked to identify the underlying rhythm. Next, GPT-4o was prompted in five separate sessions.</div></div><div><h3>Results</h3><div>GPT-4o performed inadequately, averaging 3 (±2) in Group A questions and 5.40 (±1.34) in Group B questions. Upon examining the accuracy of the total ECG questions, no significant difference was found between GPT-4o, internists, and primary care physicians (<em>p</em> = 0.952 and = 0.852, respectively). Cardiologists outperformed other medical disciplines and GPT-4o (<em>p</em> &lt; 0.001), while emergency physicians followed in accuracy, though comparison to GPT-4o only indicated a trend (<em>p</em> = 0.068).</div></div><div><h3>Conclusion</h3><div>GPT-4o demonstrated suboptimal accuracy with significant under- and over-recognition of AF in ECGs with confounding factors. Despite its potential as a supportive tool for ECG interpretation, its performance did not surpass that of medical practitioners, underscoring the continued importance of human expertise in complex diagnostics.</div></div>","PeriodicalId":15606,"journal":{"name":"Journal of electrocardiology","volume":"88 ","pages":"Article 153851"},"PeriodicalIF":1.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of electrocardiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022073624003212","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

Atrial fibrillation (AF) is the most common arrhythmia in clinical practice, yet interpretation concerns among healthcare providers persist. Confounding factors contribute to false-positive and false-negative AF diagnoses, leading to potential omissions. Artificial intelligence advancements show promise in electrocardiogram (ECG) interpretation. We sought to examine the diagnostic accuracy of ChatGPT-4omni (GPT-4o), equipped with image evaluation capabilities, in interpreting ECGs with confounding factors and compare its performance to that of physicians.

Methods

Twenty ECG cases, divided into Group A (10 cases of AF or atrial flutter) and Group B (10 cases of sinus or another atrial rhythm), were crafted into multiple-choice questions. Total of 100 practitioners (25 from each: emergency medicine, internal medicine, primary care, and cardiology) were tasked to identify the underlying rhythm. Next, GPT-4o was prompted in five separate sessions.

Results

GPT-4o performed inadequately, averaging 3 (±2) in Group A questions and 5.40 (±1.34) in Group B questions. Upon examining the accuracy of the total ECG questions, no significant difference was found between GPT-4o, internists, and primary care physicians (p = 0.952 and = 0.852, respectively). Cardiologists outperformed other medical disciplines and GPT-4o (p < 0.001), while emergency physicians followed in accuracy, though comparison to GPT-4o only indicated a trend (p = 0.068).

Conclusion

GPT-4o demonstrated suboptimal accuracy with significant under- and over-recognition of AF in ECGs with confounding factors. Despite its potential as a supportive tool for ECG interpretation, its performance did not surpass that of medical practitioners, underscoring the continued importance of human expertise in complex diagnostics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面对混杂因素:房颤检测-从业人员vs. ChatGPT。
房颤(AF)是临床实践中最常见的心律失常,但医疗保健提供者对其解释的担忧仍然存在。混杂因素导致假阳性和假阴性房颤诊断,导致潜在的遗漏。人工智能的进步在心电图(ECG)解释方面显示出希望。我们试图检验具有图像评估功能的ChatGPT-4omni (gpt - 40)在解释混杂因素的心电图时的诊断准确性,并将其表现与医生的表现进行比较。方法:将20例心电图分为A组(房颤或心房扑动10例)和B组(窦性或其他心房节律10例),制作选择题。共有100名从业人员(分别来自急诊医学、内科医学、初级保健和心脏病学各25人)被要求确定潜在的心律。接下来,gpt - 40分五次进行提示。结果:gpt - 40表现不佳,A组平均3(±2)分,B组平均5.40(±1.34)分。在检查全部心电图问题的准确性时,gpt - 40、内科医生和初级保健医生之间没有发现显著差异(p分别= 0.952和= 0.852)。心脏科医生的表现优于其他医学学科和gpt - 40 (p结论:gpt - 40在混杂因素的心电图中对房颤的识别明显不足和过度,准确性不理想。尽管它有潜力作为心电图解释的辅助工具,但它的表现并没有超过医生的表现,这强调了人类专业知识在复杂诊断中的持续重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of electrocardiology
Journal of electrocardiology 医学-心血管系统
CiteScore
2.70
自引率
7.70%
发文量
152
审稿时长
38 days
期刊介绍: The Journal of Electrocardiology is devoted exclusively to clinical and experimental studies of the electrical activities of the heart. It seeks to contribute significantly to the accuracy of diagnosis and prognosis and the effective treatment, prevention, or delay of heart disease. Editorial contents include electrocardiography, vectorcardiography, arrhythmias, membrane action potential, cardiac pacing, monitoring defibrillation, instrumentation, drug effects, and computer applications.
期刊最新文献
A dragon in a barn Ventricular preexcitation and narrow QRS tachycardia. What is the diagnosis? Expect the unexpected Challenges in localisation of culprit artery in STEMI A practical review of generative AI in cardiac electrophysiology medical education
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1