Seong-ho Lee, Brian D. Richardson, Yanyuan Ma, Karen S. Marder, Tanya P. Garcia
{"title":"Robust and efficient estimation in the presence of a randomly censored covariate","authors":"Seong-ho Lee, Brian D. Richardson, Yanyuan Ma, Karen S. Marder, Tanya P. Garcia","doi":"arxiv-2409.07795","DOIUrl":null,"url":null,"abstract":"In Huntington's disease research, a current goal is to understand how\nsymptoms change prior to a clinical diagnosis. Statistically, this entails\nmodeling symptom severity as a function of the covariate 'time until\ndiagnosis', which is often heavily right-censored in observational studies.\nExisting estimators that handle right-censored covariates have varying\nstatistical efficiency and robustness to misspecified models for nuisance\ndistributions (those of the censored covariate and censoring variable). On one\nextreme, complete case estimation, which utilizes uncensored data only, is free\nof nuisance distribution models but discards informative censored observations.\nOn the other extreme, maximum likelihood estimation is maximally efficient but\ninconsistent when the covariate's distribution is misspecified. We propose a\nsemiparametric estimator that is robust and efficient. When the nuisance\ndistributions are modeled parametrically, the estimator is doubly robust, i.e.,\nconsistent if at least one distribution is correctly specified, and\nsemiparametric efficient if both models are correctly specified. When the\nnuisance distributions are estimated via nonparametric or machine learning\nmethods, the estimator is consistent and semiparametric efficient. We show\nempirically that the proposed estimator, implemented in the R package sparcc,\nhas its claimed properties, and we apply it to study Huntington's disease\nsymptom trajectories using data from the Enroll-HD study.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In Huntington's disease research, a current goal is to understand how
symptoms change prior to a clinical diagnosis. Statistically, this entails
modeling symptom severity as a function of the covariate 'time until
diagnosis', which is often heavily right-censored in observational studies.
Existing estimators that handle right-censored covariates have varying
statistical efficiency and robustness to misspecified models for nuisance
distributions (those of the censored covariate and censoring variable). On one
extreme, complete case estimation, which utilizes uncensored data only, is free
of nuisance distribution models but discards informative censored observations.
On the other extreme, maximum likelihood estimation is maximally efficient but
inconsistent when the covariate's distribution is misspecified. We propose a
semiparametric estimator that is robust and efficient. When the nuisance
distributions are modeled parametrically, the estimator is doubly robust, i.e.,
consistent if at least one distribution is correctly specified, and
semiparametric efficient if both models are correctly specified. When the
nuisance distributions are estimated via nonparametric or machine learning
methods, the estimator is consistent and semiparametric efficient. We show
empirically that the proposed estimator, implemented in the R package sparcc,
has its claimed properties, and we apply it to study Huntington's disease
symptom trajectories using data from the Enroll-HD study.