心房颤动中的机器学习——种族偏见和谨慎的呼吁

Hiten Doshi, J. Chudow, K. Ferrick, A. Krumerman
{"title":"心房颤动中的机器学习——种族偏见和谨慎的呼吁","authors":"Hiten Doshi, J. Chudow, K. Ferrick, A. Krumerman","doi":"10.21037/jmai-21-12","DOIUrl":null,"url":null,"abstract":"© Journal of Medical Artificial Intelligence. All rights reserved. J Med Artif Intell 2021;4:6 | https://dx.doi.org/10.21037/jmai-21-12 Early diagnosis of atrial fibrillation (AF), a common arrhythmia that can cause adverse events such as stroke, is a major clinical challenge. Due to its often asymptomatic and paroxysmal nature, AF is easily missed on single electrocardiograms (ECGs), making outpatient screening challenging. As a result, patients may not receive a timely diagnosis, with up to 5% of all AF cases being diagnosed at the time of stroke (1). Various machine learning (ML) models, primarily involving supervised ML methods, have been developed with the hopes of bringing an effective population screening tool to the forefront. While these models show strong performance in their respective studies, data regarding their effectiveness across racial groups is lacking. Therefore, using ML for AF screening requires two important considerations: (I) any biases in the training set data will be perpetuated in the predictions that the models offer; (II) AF has a known racial paradox, where traditional risk factors that were derived from a largely Caucasian population have a weaker correlation with AF incidence in Black patients. Below, we elaborate on these points and argue that while ML presents a unique opportunity to increase the detection of AF, it also deserves special caution to avoid reinforcing existing healthcare disparities. ML AF screening tools are commonly developed using ECG data about p-waves, R-R intervals, heart rate, and other parameters. While this has shown the ability to produce strong predictive models, the actual data sources deserve scrutiny (2). A recently published systematic review identified that while more than 100 publications exist using ECG data to develop ML models, more than half of them used the same four open-access ECG databases (3). In theory, this is not necessarily problematic, and it is understandable that so many studies reuse well known and freely available datasets. Ideally, however, the datasets would report a sufficient level of patient diversity to well represent the entire US population. Instead, many of the most commonly used ECG datasets only report limited demographic data, including the patient’s age, gender, and/or baseline clinical characteristics, without reporting racial or ethnic background. Considering the known racial differences that exist in several baseline ECG parameters, including left ventricular hypertrophy, right axis deviation, bundle branch blocks, and others, transparency about racial demographic information in these datasets is critical (4). Table 1 summarizes the most commonly used ECG databases, as well as the readily available demographic information provided by each. The reuse of these datasets carries particular concern in the diagnosis of AF, a disease with a known “racial paradox”. This paradox refers to the fact that while Black patients have a higher burden of AF risk factors including hypertension, diabetes, congestive heart failure, and others, they paradoxically have a lower incidence of AF (5). Many explanations for this paradox have been proposed, including underdiagnosis of AF in Black patients due to lower healthcare access, regional genetic variations, or an unequal influence of certain risk factors between racial groups (6-8). In either case, the presence of this paradox makes data transparency in AF an even greater priority. In the same way that traditional risk factors for AF showed worse correlations with incidence in Black patients, we may now be developing ML models with the same shortcomings. One solution is for hospital systems to develop AF models using their own internal databases. The Mayo Letter to the Editor","PeriodicalId":73815,"journal":{"name":"Journal of medical artificial intelligence","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning in atrial fibrillation—racial bias and a call for caution\",\"authors\":\"Hiten Doshi, J. Chudow, K. Ferrick, A. Krumerman\",\"doi\":\"10.21037/jmai-21-12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"© Journal of Medical Artificial Intelligence. All rights reserved. J Med Artif Intell 2021;4:6 | https://dx.doi.org/10.21037/jmai-21-12 Early diagnosis of atrial fibrillation (AF), a common arrhythmia that can cause adverse events such as stroke, is a major clinical challenge. Due to its often asymptomatic and paroxysmal nature, AF is easily missed on single electrocardiograms (ECGs), making outpatient screening challenging. As a result, patients may not receive a timely diagnosis, with up to 5% of all AF cases being diagnosed at the time of stroke (1). Various machine learning (ML) models, primarily involving supervised ML methods, have been developed with the hopes of bringing an effective population screening tool to the forefront. While these models show strong performance in their respective studies, data regarding their effectiveness across racial groups is lacking. Therefore, using ML for AF screening requires two important considerations: (I) any biases in the training set data will be perpetuated in the predictions that the models offer; (II) AF has a known racial paradox, where traditional risk factors that were derived from a largely Caucasian population have a weaker correlation with AF incidence in Black patients. Below, we elaborate on these points and argue that while ML presents a unique opportunity to increase the detection of AF, it also deserves special caution to avoid reinforcing existing healthcare disparities. ML AF screening tools are commonly developed using ECG data about p-waves, R-R intervals, heart rate, and other parameters. While this has shown the ability to produce strong predictive models, the actual data sources deserve scrutiny (2). A recently published systematic review identified that while more than 100 publications exist using ECG data to develop ML models, more than half of them used the same four open-access ECG databases (3). In theory, this is not necessarily problematic, and it is understandable that so many studies reuse well known and freely available datasets. Ideally, however, the datasets would report a sufficient level of patient diversity to well represent the entire US population. Instead, many of the most commonly used ECG datasets only report limited demographic data, including the patient’s age, gender, and/or baseline clinical characteristics, without reporting racial or ethnic background. Considering the known racial differences that exist in several baseline ECG parameters, including left ventricular hypertrophy, right axis deviation, bundle branch blocks, and others, transparency about racial demographic information in these datasets is critical (4). Table 1 summarizes the most commonly used ECG databases, as well as the readily available demographic information provided by each. The reuse of these datasets carries particular concern in the diagnosis of AF, a disease with a known “racial paradox”. This paradox refers to the fact that while Black patients have a higher burden of AF risk factors including hypertension, diabetes, congestive heart failure, and others, they paradoxically have a lower incidence of AF (5). Many explanations for this paradox have been proposed, including underdiagnosis of AF in Black patients due to lower healthcare access, regional genetic variations, or an unequal influence of certain risk factors between racial groups (6-8). In either case, the presence of this paradox makes data transparency in AF an even greater priority. In the same way that traditional risk factors for AF showed worse correlations with incidence in Black patients, we may now be developing ML models with the same shortcomings. One solution is for hospital systems to develop AF models using their own internal databases. The Mayo Letter to the Editor\",\"PeriodicalId\":73815,\"journal\":{\"name\":\"Journal of medical artificial intelligence\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of medical artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21037/jmai-21-12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of medical artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21037/jmai-21-12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

©《医学人工智能杂志》。保留所有权利。医学人工智能杂志2021;4:6|https://dx.doi.org/10.21037/jmai-21-12心房颤动(AF)是一种常见的心律失常,可导致中风等不良事件,其早期诊断是一项重大的临床挑战。由于房颤通常是无症状和阵发性的,它很容易在单次心电图上被遗漏,这使得门诊筛查具有挑战性。因此,患者可能无法得到及时诊断,中风时诊断出的房颤病例占所有房颤病例的5%(1)。已经开发了各种机器学习(ML)模型,主要涉及监督的ML方法,希望将有效的人群筛查工具带到最前沿。虽然这些模型在各自的研究中表现出了强大的性能,但缺乏关于其在不同种族群体中有效性的数据。因此,使用ML进行AF筛查需要两个重要的考虑因素:(I)训练集数据中的任何偏差都将在模型提供的预测中持续存在;(II) 房颤有一个已知的种族悖论,来自主要是高加索人群的传统风险因素与黑人患者的房颤发病率相关性较弱。下面,我们详细阐述了这些观点,并认为虽然ML为增加AF的检测提供了一个独特的机会,但它也值得特别注意,以避免加剧现有的医疗保健差距。ML AF筛查工具通常使用有关p波、R-R间期、心率和其他参数的ECG数据开发。虽然这表明了产生强大预测模型的能力,但实际数据来源值得仔细研究(2)。最近发表的一篇系统综述发现,虽然有100多篇出版物使用心电图数据开发ML模型,但其中一半以上的出版物使用了相同的四个开放式心电图数据库(3)。理论上,这不一定有问题,可以理解的是,如此多的研究重复使用了众所周知的免费数据集。然而,理想情况下,数据集将报告足够水平的患者多样性,以很好地代表整个美国人口。相反,许多最常用的心电图数据集只报告有限的人口统计数据,包括患者的年龄、性别和/或基线临床特征,而没有报告种族或民族背景。考虑到几个基线心电图参数中存在的已知种族差异,包括左心室肥大、右轴偏移、束支传导阻滞等,这些数据集中种族人口统计信息的透明度至关重要(4)。表1总结了最常用的ECG数据库,以及每个数据库提供的现成的人口统计信息。这些数据集的重复使用在房颤的诊断中引起了特别的关注,房颤是一种已知的“种族悖论”疾病。这种悖论指的是,尽管黑人患者的房颤风险因素负担更高,包括高血压、糖尿病、充血性心力衰竭等,但他们的房颤发病率却更低(5)。人们对这种悖论提出了许多解释,包括由于医疗保健机会较低、区域基因变异或种族群体之间某些风险因素的不平等影响而导致黑人患者房颤诊断不足(6-8)。在任何一种情况下,这种悖论的存在都会使AF中的数据透明度变得更加重要。正如传统的房颤危险因素与黑人患者的发病率相关性较差一样,我们现在可能正在开发具有同样缺点的ML模型。一种解决方案是医院系统使用自己的内部数据库开发AF模型。梅奥致编辑的信
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Machine learning in atrial fibrillation—racial bias and a call for caution
© Journal of Medical Artificial Intelligence. All rights reserved. J Med Artif Intell 2021;4:6 | https://dx.doi.org/10.21037/jmai-21-12 Early diagnosis of atrial fibrillation (AF), a common arrhythmia that can cause adverse events such as stroke, is a major clinical challenge. Due to its often asymptomatic and paroxysmal nature, AF is easily missed on single electrocardiograms (ECGs), making outpatient screening challenging. As a result, patients may not receive a timely diagnosis, with up to 5% of all AF cases being diagnosed at the time of stroke (1). Various machine learning (ML) models, primarily involving supervised ML methods, have been developed with the hopes of bringing an effective population screening tool to the forefront. While these models show strong performance in their respective studies, data regarding their effectiveness across racial groups is lacking. Therefore, using ML for AF screening requires two important considerations: (I) any biases in the training set data will be perpetuated in the predictions that the models offer; (II) AF has a known racial paradox, where traditional risk factors that were derived from a largely Caucasian population have a weaker correlation with AF incidence in Black patients. Below, we elaborate on these points and argue that while ML presents a unique opportunity to increase the detection of AF, it also deserves special caution to avoid reinforcing existing healthcare disparities. ML AF screening tools are commonly developed using ECG data about p-waves, R-R intervals, heart rate, and other parameters. While this has shown the ability to produce strong predictive models, the actual data sources deserve scrutiny (2). A recently published systematic review identified that while more than 100 publications exist using ECG data to develop ML models, more than half of them used the same four open-access ECG databases (3). In theory, this is not necessarily problematic, and it is understandable that so many studies reuse well known and freely available datasets. Ideally, however, the datasets would report a sufficient level of patient diversity to well represent the entire US population. Instead, many of the most commonly used ECG datasets only report limited demographic data, including the patient’s age, gender, and/or baseline clinical characteristics, without reporting racial or ethnic background. Considering the known racial differences that exist in several baseline ECG parameters, including left ventricular hypertrophy, right axis deviation, bundle branch blocks, and others, transparency about racial demographic information in these datasets is critical (4). Table 1 summarizes the most commonly used ECG databases, as well as the readily available demographic information provided by each. The reuse of these datasets carries particular concern in the diagnosis of AF, a disease with a known “racial paradox”. This paradox refers to the fact that while Black patients have a higher burden of AF risk factors including hypertension, diabetes, congestive heart failure, and others, they paradoxically have a lower incidence of AF (5). Many explanations for this paradox have been proposed, including underdiagnosis of AF in Black patients due to lower healthcare access, regional genetic variations, or an unequal influence of certain risk factors between racial groups (6-8). In either case, the presence of this paradox makes data transparency in AF an even greater priority. In the same way that traditional risk factors for AF showed worse correlations with incidence in Black patients, we may now be developing ML models with the same shortcomings. One solution is for hospital systems to develop AF models using their own internal databases. The Mayo Letter to the Editor
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.30
自引率
0.00%
发文量
0
期刊最新文献
Artificial intelligence in periodontology and implantology—a narrative review Exploring the capabilities and limitations of large language models in nuclear medicine knowledge with primary focus on GPT-3.5, GPT-4 and Google Bard Hybrid artificial intelligence outcome prediction using features extraction from stress perfusion cardiac magnetic resonance images and electronic health records Analysis of factors influencing maternal mortality and newborn health—a machine learning approach Efficient glioma grade prediction using learned features extracted from convolutional neural networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1