Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data.

Guangyu Zhang, Charles E Rose, Yujia Zhang, Rui Li, Florence C Lee, Greta Massetti, Laura E Adams
{"title":"Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data.","authors":"Guangyu Zhang,&nbsp;Charles E Rose,&nbsp;Yujia Zhang,&nbsp;Rui Li,&nbsp;Florence C Lee,&nbsp;Greta Massetti,&nbsp;Laura E Adams","doi":"10.6000/1929-6029.2022.11.01","DOIUrl":null,"url":null,"abstract":"<p><p>The COVID-19 pandemic has resulted in a disproportionate burden on racial and ethnic minority groups, but incompleteness in surveillance data limits understanding of disparities. CDC's case-based surveillance system contains case-level information on most COVID-19 cases in the United States. Data analyzed in this paper contain COVID-19 cases with case-level information through September 25, 2020, which represent 70.9% of all COVID-19 cases reported to CDC during the period. Case-level surveillance data are used to investigate COVID-19 disparities by race/ethnicity, sex, and age. However, demographic information on race and ethnicity is missing for a substantial percentage of COVID-19 cases (e.g., 35.8% and 47.2% of cases analyzed were missing race and ethnicity information, respectively). Our goal in this study was to impute missing race and ethnicity to derive more accurate incidence and incidence rate ratio (IRR) estimates for different racial and ethnic groups, and evaluate the results from imputation compared to complete case analysis, which involves removing cases with missing race/ethnicity information from the analysis. Two multiple imputation (MI) models were developed. Model 1 imputes race using six binary race variables, and Model 2 imputes race as a composite multinomial variable. Our evaluation found that compared with complete case analysis, MI reduced biases and improved coverage on incidence and IRR estimates for all race/ethnicity groups, except for the Non-Hispanic Multiple/other group. Our research highlights the importance of supplementing complete case analysis with additional methods of analysis to better describe racial and ethnic disparities. When race and ethnicity data are missing, multiple imputation may provide more accurate incidence and IRR estimates to monitor these disparities in tandem with efforts to improve the collection of race and ethnicity information for pandemic surveillance.</p>","PeriodicalId":73480,"journal":{"name":"International journal of statistics in medical research","volume":"11 ","pages":"1-11"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8967240/pdf/","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of statistics in medical research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6000/1929-6029.2022.11.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The COVID-19 pandemic has resulted in a disproportionate burden on racial and ethnic minority groups, but incompleteness in surveillance data limits understanding of disparities. CDC's case-based surveillance system contains case-level information on most COVID-19 cases in the United States. Data analyzed in this paper contain COVID-19 cases with case-level information through September 25, 2020, which represent 70.9% of all COVID-19 cases reported to CDC during the period. Case-level surveillance data are used to investigate COVID-19 disparities by race/ethnicity, sex, and age. However, demographic information on race and ethnicity is missing for a substantial percentage of COVID-19 cases (e.g., 35.8% and 47.2% of cases analyzed were missing race and ethnicity information, respectively). Our goal in this study was to impute missing race and ethnicity to derive more accurate incidence and incidence rate ratio (IRR) estimates for different racial and ethnic groups, and evaluate the results from imputation compared to complete case analysis, which involves removing cases with missing race/ethnicity information from the analysis. Two multiple imputation (MI) models were developed. Model 1 imputes race using six binary race variables, and Model 2 imputes race as a composite multinomial variable. Our evaluation found that compared with complete case analysis, MI reduced biases and improved coverage on incidence and IRR estimates for all race/ethnicity groups, except for the Non-Hispanic Multiple/other group. Our research highlights the importance of supplementing complete case analysis with additional methods of analysis to better describe racial and ethnic disparities. When race and ethnicity data are missing, multiple imputation may provide more accurate incidence and IRR estimates to monitor these disparities in tandem with efforts to improve the collection of race and ethnicity information for pandemic surveillance.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CDC COVID-19病例级监测数据中缺失种族和民族的多重代入
COVID-19大流行给种族和少数民族群体造成了不成比例的负担,但监测数据的不完整性限制了对差异的理解。疾病预防控制中心基于病例的监测系统包含美国大多数COVID-19病例的病例级信息。本文分析的数据包含截至2020年9月25日的病例级信息,占同期向疾病预防控制中心报告的所有COVID-19病例的70.9%。病例级监测数据用于调查按种族/民族、性别和年龄划分的COVID-19差异。然而,在很大比例的COVID-19病例中,缺少关于种族和族裔的人口统计信息(例如,所分析的病例中,分别有35.8%和47.2%缺少种族和族裔信息)。本研究的目的是对缺失的种族和民族进行归因,以获得更准确的不同种族和民族的发病率和发病率比(IRR)估计值,并将归因结果与完整的病例分析(包括从分析中删除缺失种族/民族信息的病例)进行比较。建立了两种多重插值(MI)模型。模型1使用6个二元种族变量来推算种族,模型2使用一个复合多项变量来推算种族。我们的评估发现,与完整的病例分析相比,心肌梗死减少了偏倚,提高了所有种族/民族的发病率和IRR估计的覆盖率,除了非西班牙裔/其他组。我们的研究强调了用其他分析方法补充完整案例分析的重要性,以更好地描述种族和民族差异。当种族和族裔数据缺失时,多重归因可以提供更准确的发病率和IRR估计,以监测这些差异,同时努力改善大流行监测中种族和族裔信息的收集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
0.40
自引率
0.00%
发文量
0
期刊最新文献
Support of Characteristics, Physical Environmental and Psychological On Quality Of Life Of Patients With DM Type II Competing Risks Model to Evaluate Dropout Dynamics Among the Type 1 Diabetes Patients Registered with the Changing Diabetes in Children (CDiC) Program The Impact of the Risk Perception of COVID-19 PANDEMIC on College Students' Occupational Anxiety: The Moderating Effect of Career Adaptability Adaptive Elastic Net on High-Dimensional Sparse Data with Multicollinearity: Application to Lipomatous Tumor Classification Triglyceridemic Waist Phenotypes as Risk Factors for Type 2 Diabetes Mellitus: A Systematic Review and Meta-Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1