Inside the “brain” of an artificial neural network: an interpretable deep learning approach to paroxysmal atrial fibrillation diagnosis from electrocardiogram signals during sinus rhythm

IF 3.9 Q1 CARDIAC & CARDIOVASCULAR SYSTEMS European heart journal. Digital health Pub Date : 2022-10-01 DOI:10.1093/ehjdh/ztac076.2781

P. Pantelidis, E. Oikonomou, S. Lampsas, N. Souvaliotis, M. Spartalis, M. Vavuranakis, M. Bampa, P. Papapetrou, G. Siasos, M. Vavuranakis

{"title":"Inside the “brain” of an artificial neural network: an interpretable deep learning approach to paroxysmal atrial fibrillation diagnosis from electrocardiogram signals during sinus rhythm","authors":"P. Pantelidis, E. Oikonomou, S. Lampsas, N. Souvaliotis, M. Spartalis, M. Vavuranakis, M. Bampa, P. Papapetrou, G. Siasos, M. Vavuranakis","doi":"10.1093/ehjdh/ztac076.2781","DOIUrl":null,"url":null,"abstract":"Abstract Background With the ongoing, rapid advances in Deep Learning (DL), such solutions can now detect medical conditions even invisible to the human eye. In this direction, efforts have been made to develop DL algorithms that diagnose paroxysmal atrial fibrillation (PAF) from electrocardiogram (ECG) signals in sinus rhythm (SR). However, many of the available approaches function as “black boxes”, with physicians unable to understand and trust their predictions. Purpose To train a DL model to detect PAF patients while in SR and apply an algorithm that interprets and visualises its decisions. Methods We obtained ECG samples from PAF and non-PAF patients during SR, from the PAF Prediction Challenge Database. After discarding unannotated samples and augmenting the sample size (by dividing each signal into 30-second segments), we split the whole dataset into a train (68%), a validation (16%) and a test (16%) set. No pair of samples belonging to different sets originated from the same patient. We trained the InceptionTime neural network on the train/validation sets and tested on the “unseen” test set after “hiding” the correct answers. Its performance was evaluated with the following metrics: Accuracy, f1-score, precision and recall (sensitivity). After repeating this process 20 times, we obtained a distribution for each score. Finally, we adjusted the Grad-CAM interpretation algorithm to our data and used it to visualise the areas perceived as important by the model. Results After pre-processing, 4,080, 30-second, two-lead ECG signals were allocated to the train set, 960 to the validation and 960 to the test set. Each subset contained an equal number of PAF and non-PAF samples. After repeated training and testing, we obtained a median accuracy of 0.84 (interquartile range, IQR: 0.66–0.88), an f1-score of 0.82 (IQR: 0.68–0.88) and a median precision and recall equal to 0.93 (IQR: 0.67–0.99) and 0.77 (IQR: 0.68–0.93), respectively. The Grad-CAM technique highlighted the ECG areas of interest that led to each decision. We selected and present both PAF-positive and -negative samples, perceived either correctly or falsely. Interestingly, correct model decisions tend to focus on the P-wave, while false ones fixate on other regions. Conclusions Although a pilot study with considerable limitations (small sample size, disregard of possible confounding due to comorbidities or other factors), this work shows how DL can be employed to distinguish between PAF and non-PAF patients from SR ECG samples, and confirms the potential of DL-enabled approaches to offer novel diagnostic capabilities. Most importantly, our effort provides a comprehensible, visual interpretation of the model's decisions. Demystifying DL behaviour can, not only improve such efforts by explaining false decisions, but also cultivate trust among clinicians and, possibly, point out directions for future research, since we can now see through the magnifying lens of a neural network. Funding Acknowledgement Type of funding sources: None. Figure 1 Figure 2","PeriodicalId":72965,"journal":{"name":"European heart journal. Digital health","volume":null,"pages":null},"PeriodicalIF":3.9000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European heart journal. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ehjdh/ztac076.2781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Background With the ongoing, rapid advances in Deep Learning (DL), such solutions can now detect medical conditions even invisible to the human eye. In this direction, efforts have been made to develop DL algorithms that diagnose paroxysmal atrial fibrillation (PAF) from electrocardiogram (ECG) signals in sinus rhythm (SR). However, many of the available approaches function as “black boxes”, with physicians unable to understand and trust their predictions. Purpose To train a DL model to detect PAF patients while in SR and apply an algorithm that interprets and visualises its decisions. Methods We obtained ECG samples from PAF and non-PAF patients during SR, from the PAF Prediction Challenge Database. After discarding unannotated samples and augmenting the sample size (by dividing each signal into 30-second segments), we split the whole dataset into a train (68%), a validation (16%) and a test (16%) set. No pair of samples belonging to different sets originated from the same patient. We trained the InceptionTime neural network on the train/validation sets and tested on the “unseen” test set after “hiding” the correct answers. Its performance was evaluated with the following metrics: Accuracy, f1-score, precision and recall (sensitivity). After repeating this process 20 times, we obtained a distribution for each score. Finally, we adjusted the Grad-CAM interpretation algorithm to our data and used it to visualise the areas perceived as important by the model. Results After pre-processing, 4,080, 30-second, two-lead ECG signals were allocated to the train set, 960 to the validation and 960 to the test set. Each subset contained an equal number of PAF and non-PAF samples. After repeated training and testing, we obtained a median accuracy of 0.84 (interquartile range, IQR: 0.66–0.88), an f1-score of 0.82 (IQR: 0.68–0.88) and a median precision and recall equal to 0.93 (IQR: 0.67–0.99) and 0.77 (IQR: 0.68–0.93), respectively. The Grad-CAM technique highlighted the ECG areas of interest that led to each decision. We selected and present both PAF-positive and -negative samples, perceived either correctly or falsely. Interestingly, correct model decisions tend to focus on the P-wave, while false ones fixate on other regions. Conclusions Although a pilot study with considerable limitations (small sample size, disregard of possible confounding due to comorbidities or other factors), this work shows how DL can be employed to distinguish between PAF and non-PAF patients from SR ECG samples, and confirms the potential of DL-enabled approaches to offer novel diagnostic capabilities. Most importantly, our effort provides a comprehensible, visual interpretation of the model's decisions. Demystifying DL behaviour can, not only improve such efforts by explaining false decisions, but also cultivate trust among clinicians and, possibly, point out directions for future research, since we can now see through the magnifying lens of a neural network. Funding Acknowledgement Type of funding sources: None. Figure 1 Figure 2

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在人工神经网络的“大脑”内部:一种可解释的深度学习方法，从窦性心律期间的心电图信号诊断阵发性心房颤动

随着深度学习(DL)的不断快速发展，这种解决方案现在可以检测人眼甚至看不见的医疗状况。在这个方向上，已经努力开发从窦性心律(SR)的心电图(ECG)信号诊断阵发性心房颤动(PAF)的DL算法。然而，许多可用的方法就像“黑匣子”一样，医生无法理解和相信他们的预测。目的:训练一个深度学习模型，用于在SR中检测PAF患者，并应用一种算法来解释和可视化其决策。方法从PAF预测挑战数据库中获取PAF和非PAF患者SR期间的心电图样本。在丢弃未注释的样本并增加样本大小(通过将每个信号分成30秒的片段)之后，我们将整个数据集分成训练(68%)，验证(16%)和测试(16%)集。没有属于不同组的对样本来自同一患者。我们在训练/验证集上训练了InceptionTime神经网络，并在“隐藏”正确答案后在“未见”测试集上进行了测试。其性能通过以下指标进行评估:准确性、f1分、精密度和召回率(灵敏度)。重复这个过程20次后，我们得到了每个分数的分布。最后，我们将Grad-CAM解释算法调整为我们的数据，并使用它来可视化模型认为重要的区域。结果预处理后，将4080个30秒双导联心电信号分配给训练集，960个分配给验证集，960个分配给测试集。每个子集包含相同数量的PAF和非PAF样本。经过反复训练和测试，我们得到的中位正确率为0.84(四分位间距IQR: 0.66-0.88)， f1得分为0.82(四分位间距IQR: 0.68-0.88)，中位精密度和召回率分别为0.93(四分位间距IQR: 0.67-0.99)和0.77(四分位间距IQR: 0.68-0.93)。Grad-CAM技术突出了导致每个决定的ECG感兴趣区域。我们选择并呈现paf阳性和阴性样本，正确或错误地感知。有趣的是，正确的模型决策倾向于关注p波，而错误的模型决策则关注其他区域。结论:虽然这是一项具有相当局限性的初步研究(样本量小，不考虑合并症或其他因素可能引起的混淆)，但这项工作显示了DL如何用于从SR ECG样本中区分PAF和非PAF患者，并证实了DL支持方法提供新的诊断能力的潜力。最重要的是，我们的努力为模型的决策提供了一个可理解的、可视化的解释。揭开深度学习行为的神秘面纱，不仅可以通过解释错误的决定来改善这种努力，还可以培养临床医生之间的信任，并可能为未来的研究指明方向，因为我们现在可以通过神经网络的放大镜来观察。资金来源类型:无。图1图2

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

European heart journal. Digital health

CiteScore

5.00

自引率

0.00%

发文量