Seong-ho Lee, Brian D. Richardson, Yanyuan Ma, Karen S. Marder, Tanya P. Garcia
{"title":"存在随机删减协变量时的稳健高效估计","authors":"Seong-ho Lee, Brian D. Richardson, Yanyuan Ma, Karen S. Marder, Tanya P. Garcia","doi":"arxiv-2409.07795","DOIUrl":null,"url":null,"abstract":"In Huntington's disease research, a current goal is to understand how\nsymptoms change prior to a clinical diagnosis. Statistically, this entails\nmodeling symptom severity as a function of the covariate 'time until\ndiagnosis', which is often heavily right-censored in observational studies.\nExisting estimators that handle right-censored covariates have varying\nstatistical efficiency and robustness to misspecified models for nuisance\ndistributions (those of the censored covariate and censoring variable). On one\nextreme, complete case estimation, which utilizes uncensored data only, is free\nof nuisance distribution models but discards informative censored observations.\nOn the other extreme, maximum likelihood estimation is maximally efficient but\ninconsistent when the covariate's distribution is misspecified. We propose a\nsemiparametric estimator that is robust and efficient. When the nuisance\ndistributions are modeled parametrically, the estimator is doubly robust, i.e.,\nconsistent if at least one distribution is correctly specified, and\nsemiparametric efficient if both models are correctly specified. When the\nnuisance distributions are estimated via nonparametric or machine learning\nmethods, the estimator is consistent and semiparametric efficient. We show\nempirically that the proposed estimator, implemented in the R package sparcc,\nhas its claimed properties, and we apply it to study Huntington's disease\nsymptom trajectories using data from the Enroll-HD study.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust and efficient estimation in the presence of a randomly censored covariate\",\"authors\":\"Seong-ho Lee, Brian D. Richardson, Yanyuan Ma, Karen S. Marder, Tanya P. Garcia\",\"doi\":\"arxiv-2409.07795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Huntington's disease research, a current goal is to understand how\\nsymptoms change prior to a clinical diagnosis. Statistically, this entails\\nmodeling symptom severity as a function of the covariate 'time until\\ndiagnosis', which is often heavily right-censored in observational studies.\\nExisting estimators that handle right-censored covariates have varying\\nstatistical efficiency and robustness to misspecified models for nuisance\\ndistributions (those of the censored covariate and censoring variable). On one\\nextreme, complete case estimation, which utilizes uncensored data only, is free\\nof nuisance distribution models but discards informative censored observations.\\nOn the other extreme, maximum likelihood estimation is maximally efficient but\\ninconsistent when the covariate's distribution is misspecified. We propose a\\nsemiparametric estimator that is robust and efficient. When the nuisance\\ndistributions are modeled parametrically, the estimator is doubly robust, i.e.,\\nconsistent if at least one distribution is correctly specified, and\\nsemiparametric efficient if both models are correctly specified. When the\\nnuisance distributions are estimated via nonparametric or machine learning\\nmethods, the estimator is consistent and semiparametric efficient. We show\\nempirically that the proposed estimator, implemented in the R package sparcc,\\nhas its claimed properties, and we apply it to study Huntington's disease\\nsymptom trajectories using data from the Enroll-HD study.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07795\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
亨廷顿氏病研究的当前目标是了解临床诊断前症状是如何变化的。从统计学角度来看,这需要将症状严重程度作为协变量 "诊断前时间 "的函数来建模,而在观察性研究中,"诊断前时间 "往往是严重右删失的。现有处理右删失协变量的估计器具有不同的统计效率和对滋扰分布(删失协变量和删失变量的分布)的错误模型的稳健性。从一个极端来看,只利用未删减数据的完全情况估计不受干扰分布模型的影响,但会丢弃有信息量的删减观测值;从另一个极端来看,最大似然估计具有最大效率,但在协变量分布被错误定义时却不一致。我们提出了一种稳健高效的参数估计方法。当被扰分布以参数方式建模时,估计器具有双重稳健性,即如果至少一个分布被正确指定,则估计器具有一致性;如果两个模型都被正确指定,则估计器具有半参数效率。当通过非参数或机器学习方法估计扰动分布时,估计器是一致的,并且是半参数有效的。我们用经验证明了在 R 软件包 sparcc 中实现的估计器具有所宣称的特性,并利用 Enroll-HD 研究的数据将其用于研究亨廷顿氏病的症状轨迹。
Robust and efficient estimation in the presence of a randomly censored covariate
In Huntington's disease research, a current goal is to understand how
symptoms change prior to a clinical diagnosis. Statistically, this entails
modeling symptom severity as a function of the covariate 'time until
diagnosis', which is often heavily right-censored in observational studies.
Existing estimators that handle right-censored covariates have varying
statistical efficiency and robustness to misspecified models for nuisance
distributions (those of the censored covariate and censoring variable). On one
extreme, complete case estimation, which utilizes uncensored data only, is free
of nuisance distribution models but discards informative censored observations.
On the other extreme, maximum likelihood estimation is maximally efficient but
inconsistent when the covariate's distribution is misspecified. We propose a
semiparametric estimator that is robust and efficient. When the nuisance
distributions are modeled parametrically, the estimator is doubly robust, i.e.,
consistent if at least one distribution is correctly specified, and
semiparametric efficient if both models are correctly specified. When the
nuisance distributions are estimated via nonparametric or machine learning
methods, the estimator is consistent and semiparametric efficient. We show
empirically that the proposed estimator, implemented in the R package sparcc,
has its claimed properties, and we apply it to study Huntington's disease
symptom trajectories using data from the Enroll-HD study.