The COVID-19 pandemic has resulted in a disproportionate burden on racial and ethnic minority groups, but incompleteness in surveillance data limits understanding of disparities. CDC's case-based surveillance system contains case-level information on most COVID-19 cases in the United States. Data analyzed in this paper contain COVID-19 cases with case-level information through September 25, 2020, which represent 70.9% of all COVID-19 cases reported to CDC during the period. Case-level surveillance data are used to investigate COVID-19 disparities by race/ethnicity, sex, and age. However, demographic information on race and ethnicity is missing for a substantial percentage of COVID-19 cases (e.g., 35.8% and 47.2% of cases analyzed were missing race and ethnicity information, respectively). Our goal in this study was to impute missing race and ethnicity to derive more accurate incidence and incidence rate ratio (IRR) estimates for different racial and ethnic groups, and evaluate the results from imputation compared to complete case analysis, which involves removing cases with missing race/ethnicity information from the analysis. Two multiple imputation (MI) models were developed. Model 1 imputes race using six binary race variables, and Model 2 imputes race as a composite multinomial variable. Our evaluation found that compared with complete case analysis, MI reduced biases and improved coverage on incidence and IRR estimates for all race/ethnicity groups, except for the Non-Hispanic Multiple/other group. Our research highlights the importance of supplementing complete case analysis with additional methods of analysis to better describe racial and ethnic disparities. When race and ethnicity data are missing, multiple imputation may provide more accurate incidence and IRR estimates to monitor these disparities in tandem with efforts to improve the collection of race and ethnicity information for pandemic surveillance.
The latest data from the United States Renal Data Systems show over 134,000 individuals with end-stage kidney disease (ESKD) starting dialysis in the year 2019. ESKD patients on dialysis, the default treatment strategy, have high mortality and hospitalization, especially in the first year of dialysis. An alternative treatment strategy is (non-dialysis) conservative management (CM). The relative effectiveness of CM with respect to various patient outcomes, including survival, hospitalization, and health-related quality of life among others, especially in elderly ESKD or advanced chronic kidney disease patients with serious comorbidities, is an active area of research. A technical challenge inherent in comparing patient outcomes between CM and dialysis patient groups is that the start of follow-up time is "not defined" for patients on CM because they do not initiate dialysis. One solution is the use of putative dialysis initiation (PDI) time. In this work, we examine the validity of the use of PDI time to determine the start of follow-up for longitudinal retrospective and prospective cohort studies involving CM. We propose and assess the efficacy of estimating PDI time using linear mixed effects model of kidney function decline over time via simulation studies. We also illustrate how the estimated PDI time can be used to effectively estimate the survival distribution.