{"title":"心房颤动中的机器学习——种族偏见和谨慎的呼吁","authors":"Hiten Doshi, J. Chudow, K. Ferrick, A. Krumerman","doi":"10.21037/jmai-21-12","DOIUrl":null,"url":null,"abstract":"© Journal of Medical Artificial Intelligence. All rights reserved. J Med Artif Intell 2021;4:6 | https://dx.doi.org/10.21037/jmai-21-12 Early diagnosis of atrial fibrillation (AF), a common arrhythmia that can cause adverse events such as stroke, is a major clinical challenge. Due to its often asymptomatic and paroxysmal nature, AF is easily missed on single electrocardiograms (ECGs), making outpatient screening challenging. As a result, patients may not receive a timely diagnosis, with up to 5% of all AF cases being diagnosed at the time of stroke (1). Various machine learning (ML) models, primarily involving supervised ML methods, have been developed with the hopes of bringing an effective population screening tool to the forefront. While these models show strong performance in their respective studies, data regarding their effectiveness across racial groups is lacking. Therefore, using ML for AF screening requires two important considerations: (I) any biases in the training set data will be perpetuated in the predictions that the models offer; (II) AF has a known racial paradox, where traditional risk factors that were derived from a largely Caucasian population have a weaker correlation with AF incidence in Black patients. Below, we elaborate on these points and argue that while ML presents a unique opportunity to increase the detection of AF, it also deserves special caution to avoid reinforcing existing healthcare disparities. ML AF screening tools are commonly developed using ECG data about p-waves, R-R intervals, heart rate, and other parameters. While this has shown the ability to produce strong predictive models, the actual data sources deserve scrutiny (2). A recently published systematic review identified that while more than 100 publications exist using ECG data to develop ML models, more than half of them used the same four open-access ECG databases (3). In theory, this is not necessarily problematic, and it is understandable that so many studies reuse well known and freely available datasets. Ideally, however, the datasets would report a sufficient level of patient diversity to well represent the entire US population. Instead, many of the most commonly used ECG datasets only report limited demographic data, including the patient’s age, gender, and/or baseline clinical characteristics, without reporting racial or ethnic background. Considering the known racial differences that exist in several baseline ECG parameters, including left ventricular hypertrophy, right axis deviation, bundle branch blocks, and others, transparency about racial demographic information in these datasets is critical (4). Table 1 summarizes the most commonly used ECG databases, as well as the readily available demographic information provided by each. The reuse of these datasets carries particular concern in the diagnosis of AF, a disease with a known “racial paradox”. This paradox refers to the fact that while Black patients have a higher burden of AF risk factors including hypertension, diabetes, congestive heart failure, and others, they paradoxically have a lower incidence of AF (5). Many explanations for this paradox have been proposed, including underdiagnosis of AF in Black patients due to lower healthcare access, regional genetic variations, or an unequal influence of certain risk factors between racial groups (6-8). In either case, the presence of this paradox makes data transparency in AF an even greater priority. In the same way that traditional risk factors for AF showed worse correlations with incidence in Black patients, we may now be developing ML models with the same shortcomings. One solution is for hospital systems to develop AF models using their own internal databases. The Mayo Letter to the Editor","PeriodicalId":73815,"journal":{"name":"Journal of medical artificial intelligence","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning in atrial fibrillation—racial bias and a call for caution\",\"authors\":\"Hiten Doshi, J. Chudow, K. Ferrick, A. Krumerman\",\"doi\":\"10.21037/jmai-21-12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"© Journal of Medical Artificial Intelligence. All rights reserved. J Med Artif Intell 2021;4:6 | https://dx.doi.org/10.21037/jmai-21-12 Early diagnosis of atrial fibrillation (AF), a common arrhythmia that can cause adverse events such as stroke, is a major clinical challenge. Due to its often asymptomatic and paroxysmal nature, AF is easily missed on single electrocardiograms (ECGs), making outpatient screening challenging. As a result, patients may not receive a timely diagnosis, with up to 5% of all AF cases being diagnosed at the time of stroke (1). Various machine learning (ML) models, primarily involving supervised ML methods, have been developed with the hopes of bringing an effective population screening tool to the forefront. While these models show strong performance in their respective studies, data regarding their effectiveness across racial groups is lacking. Therefore, using ML for AF screening requires two important considerations: (I) any biases in the training set data will be perpetuated in the predictions that the models offer; (II) AF has a known racial paradox, where traditional risk factors that were derived from a largely Caucasian population have a weaker correlation with AF incidence in Black patients. Below, we elaborate on these points and argue that while ML presents a unique opportunity to increase the detection of AF, it also deserves special caution to avoid reinforcing existing healthcare disparities. ML AF screening tools are commonly developed using ECG data about p-waves, R-R intervals, heart rate, and other parameters. While this has shown the ability to produce strong predictive models, the actual data sources deserve scrutiny (2). A recently published systematic review identified that while more than 100 publications exist using ECG data to develop ML models, more than half of them used the same four open-access ECG databases (3). In theory, this is not necessarily problematic, and it is understandable that so many studies reuse well known and freely available datasets. Ideally, however, the datasets would report a sufficient level of patient diversity to well represent the entire US population. Instead, many of the most commonly used ECG datasets only report limited demographic data, including the patient’s age, gender, and/or baseline clinical characteristics, without reporting racial or ethnic background. Considering the known racial differences that exist in several baseline ECG parameters, including left ventricular hypertrophy, right axis deviation, bundle branch blocks, and others, transparency about racial demographic information in these datasets is critical (4). Table 1 summarizes the most commonly used ECG databases, as well as the readily available demographic information provided by each. The reuse of these datasets carries particular concern in the diagnosis of AF, a disease with a known “racial paradox”. This paradox refers to the fact that while Black patients have a higher burden of AF risk factors including hypertension, diabetes, congestive heart failure, and others, they paradoxically have a lower incidence of AF (5). Many explanations for this paradox have been proposed, including underdiagnosis of AF in Black patients due to lower healthcare access, regional genetic variations, or an unequal influence of certain risk factors between racial groups (6-8). In either case, the presence of this paradox makes data transparency in AF an even greater priority. In the same way that traditional risk factors for AF showed worse correlations with incidence in Black patients, we may now be developing ML models with the same shortcomings. One solution is for hospital systems to develop AF models using their own internal databases. The Mayo Letter to the Editor\",\"PeriodicalId\":73815,\"journal\":{\"name\":\"Journal of medical artificial intelligence\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of medical artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21037/jmai-21-12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of medical artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21037/jmai-21-12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Machine learning in atrial fibrillation—racial bias and a call for caution
© Journal of Medical Artificial Intelligence. All rights reserved. J Med Artif Intell 2021;4:6 | https://dx.doi.org/10.21037/jmai-21-12 Early diagnosis of atrial fibrillation (AF), a common arrhythmia that can cause adverse events such as stroke, is a major clinical challenge. Due to its often asymptomatic and paroxysmal nature, AF is easily missed on single electrocardiograms (ECGs), making outpatient screening challenging. As a result, patients may not receive a timely diagnosis, with up to 5% of all AF cases being diagnosed at the time of stroke (1). Various machine learning (ML) models, primarily involving supervised ML methods, have been developed with the hopes of bringing an effective population screening tool to the forefront. While these models show strong performance in their respective studies, data regarding their effectiveness across racial groups is lacking. Therefore, using ML for AF screening requires two important considerations: (I) any biases in the training set data will be perpetuated in the predictions that the models offer; (II) AF has a known racial paradox, where traditional risk factors that were derived from a largely Caucasian population have a weaker correlation with AF incidence in Black patients. Below, we elaborate on these points and argue that while ML presents a unique opportunity to increase the detection of AF, it also deserves special caution to avoid reinforcing existing healthcare disparities. ML AF screening tools are commonly developed using ECG data about p-waves, R-R intervals, heart rate, and other parameters. While this has shown the ability to produce strong predictive models, the actual data sources deserve scrutiny (2). A recently published systematic review identified that while more than 100 publications exist using ECG data to develop ML models, more than half of them used the same four open-access ECG databases (3). In theory, this is not necessarily problematic, and it is understandable that so many studies reuse well known and freely available datasets. Ideally, however, the datasets would report a sufficient level of patient diversity to well represent the entire US population. Instead, many of the most commonly used ECG datasets only report limited demographic data, including the patient’s age, gender, and/or baseline clinical characteristics, without reporting racial or ethnic background. Considering the known racial differences that exist in several baseline ECG parameters, including left ventricular hypertrophy, right axis deviation, bundle branch blocks, and others, transparency about racial demographic information in these datasets is critical (4). Table 1 summarizes the most commonly used ECG databases, as well as the readily available demographic information provided by each. The reuse of these datasets carries particular concern in the diagnosis of AF, a disease with a known “racial paradox”. This paradox refers to the fact that while Black patients have a higher burden of AF risk factors including hypertension, diabetes, congestive heart failure, and others, they paradoxically have a lower incidence of AF (5). Many explanations for this paradox have been proposed, including underdiagnosis of AF in Black patients due to lower healthcare access, regional genetic variations, or an unequal influence of certain risk factors between racial groups (6-8). In either case, the presence of this paradox makes data transparency in AF an even greater priority. In the same way that traditional risk factors for AF showed worse correlations with incidence in Black patients, we may now be developing ML models with the same shortcomings. One solution is for hospital systems to develop AF models using their own internal databases. The Mayo Letter to the Editor