Pub Date : 2024-07-01Epub Date: 2024-06-24DOI: 10.1007/s10985-024-09631-0
Mei-Ling Ting Lee
{"title":"Special issue dedicated to Mitchell H. Gail, M.D. Ph.D.","authors":"Mei-Ling Ting Lee","doi":"10.1007/s10985-024-09631-0","DOIUrl":"10.1007/s10985-024-09631-0","url":null,"abstract":"","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141443608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-05-08DOI: 10.1007/s10985-024-09628-9
Yaqi Cao, Weidong Ma, Ge Zhao, Anne Marie McCarthy, Jinbo Chen
The added value of candidate predictors for risk modeling is routinely evaluated by comparing the performance of models with or without including candidate predictors. Such comparison is most meaningful when the estimated risk by the two models are both unbiased in the target population. Very often data for candidate predictors are sourced from nonrepresentative convenience samples. Updating the base model using the study data without acknowledging the discrepancy between the underlying distribution of the study data and that in the target population can lead to biased risk estimates and therefore an unfair evaluation of candidate predictors. To address this issue assuming access to a well-calibrated base model, we propose a semiparametric method for model fitting that enforces good calibration. The central idea is to calibrate the fitted model against the base model by enforcing suitable constraints in maximizing the likelihood function. This approach enables unbiased assessment of model improvement offered by candidate predictors without requiring a representative sample from the target population, thus overcoming a significant practical challenge. We study theoretical properties for model parameter estimates, and demonstrate improvement in model calibration via extensive simulation studies. Finally, we apply the proposed method to data extracted from Penn Medicine Biobank to inform the added value of breast density for breast cancer risk assessment in the Caucasian woman population.
{"title":"A constrained maximum likelihood approach to developing well-calibrated models for predicting binary outcomes.","authors":"Yaqi Cao, Weidong Ma, Ge Zhao, Anne Marie McCarthy, Jinbo Chen","doi":"10.1007/s10985-024-09628-9","DOIUrl":"10.1007/s10985-024-09628-9","url":null,"abstract":"<p><p>The added value of candidate predictors for risk modeling is routinely evaluated by comparing the performance of models with or without including candidate predictors. Such comparison is most meaningful when the estimated risk by the two models are both unbiased in the target population. Very often data for candidate predictors are sourced from nonrepresentative convenience samples. Updating the base model using the study data without acknowledging the discrepancy between the underlying distribution of the study data and that in the target population can lead to biased risk estimates and therefore an unfair evaluation of candidate predictors. To address this issue assuming access to a well-calibrated base model, we propose a semiparametric method for model fitting that enforces good calibration. The central idea is to calibrate the fitted model against the base model by enforcing suitable constraints in maximizing the likelihood function. This approach enables unbiased assessment of model improvement offered by candidate predictors without requiring a representative sample from the target population, thus overcoming a significant practical challenge. We study theoretical properties for model parameter estimates, and demonstrate improvement in model calibration via extensive simulation studies. Finally, we apply the proposed method to data extracted from Penn Medicine Biobank to inform the added value of breast density for breast cancer risk assessment in the Caucasian woman population.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140877759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-05-06DOI: 10.1007/s10985-024-09629-8
R L Prentice
Data analysis methods for the study of treatments or exposures in relation to a clinical outcome in the presence of competing risks have a long history, often with inference targets that are hypothetical, thereby requiring strong assumptions for identifiability with available data. Here data analysis methods are considered that are based on single and higher dimensional marginal hazard rates, quantities that are identifiable under standard independent censoring assumptions. These lead naturally to joint survival function estimators for outcomes of interest, including competing risk outcomes, and provide the basis for addressing a variety of data analysis questions. These methods will be illustrated using simulations and Women's Health Initiative cohort and clinical trial data sets, and additional research needs will be described.
{"title":"Competing risks and multivariate outcomes in epidemiological and clinical trial research.","authors":"R L Prentice","doi":"10.1007/s10985-024-09629-8","DOIUrl":"10.1007/s10985-024-09629-8","url":null,"abstract":"<p><p>Data analysis methods for the study of treatments or exposures in relation to a clinical outcome in the presence of competing risks have a long history, often with inference targets that are hypothetical, thereby requiring strong assumptions for identifiability with available data. Here data analysis methods are considered that are based on single and higher dimensional marginal hazard rates, quantities that are identifiable under standard independent censoring assumptions. These lead naturally to joint survival function estimators for outcomes of interest, including competing risk outcomes, and provide the basis for addressing a variety of data analysis questions. These methods will be illustrated using simulations and Women's Health Initiative cohort and clinical trial data sets, and additional research needs will be described.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140858787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-03-01DOI: 10.1007/s10985-024-09622-1
Damitri Kundu, Shekhar Krishnan, Manash Pratim Gogoi, Kiranmoy Das
Linear mixed models are traditionally used for jointly modeling (multivariate) longitudinal outcomes and event-time(s). However, when the outcomes are non-Gaussian a quantile regression model is more appropriate. In addition, in the presence of some time-varying covariates, it might be of interest to see how the effects of different covariates vary from one quantile level (of outcomes) to the other, and consequently how the event-time changes across different quantiles. For such analyses linear quantile mixed models can be used, and an efficient computational algorithm can be developed. We analyze a dataset from the Acute Lymphocytic Leukemia (ALL) maintenance study conducted by Tata Medical Center, Kolkata. In this study, the patients suffering from ALL were treated with two standard drugs (6MP and MTx) for the first two years, and three biomarkers (e.g. lymphocyte count, neutrophil count and platelet count) were longitudinally measured. After treatment the patients were followed nearly for the next three years, and the relapse-time (if any) for each patient was recorded. For this dataset we develop a Bayesian quantile joint model for the three longitudinal biomarkers and time-to-relapse. We consider an Asymmetric Laplace Distribution (ALD) for each outcome, and exploit the mixture representation of the ALD for developing a Gibbs sampler algorithm to estimate the regression coefficients. Our proposed model allows different quantile levels for different biomarkers, but still simultaneously estimates the regression coefficients corresponding to a particular quantile combination. We infer that a higher lymphocyte count accelerates the chance of a relapse while a higher neutrophil count and a higher platelet count (jointly) reduce it. Also, we infer that across (almost) all quantiles 6MP reduces the lymphocyte count, while MTx increases the neutrophil count. Simulation studies are performed to assess the effectiveness of the proposed approach.
{"title":"A Bayesian quantile joint modeling of multivariate longitudinal and time-to-event data.","authors":"Damitri Kundu, Shekhar Krishnan, Manash Pratim Gogoi, Kiranmoy Das","doi":"10.1007/s10985-024-09622-1","DOIUrl":"10.1007/s10985-024-09622-1","url":null,"abstract":"<p><p>Linear mixed models are traditionally used for jointly modeling (multivariate) longitudinal outcomes and event-time(s). However, when the outcomes are non-Gaussian a quantile regression model is more appropriate. In addition, in the presence of some time-varying covariates, it might be of interest to see how the effects of different covariates vary from one quantile level (of outcomes) to the other, and consequently how the event-time changes across different quantiles. For such analyses linear quantile mixed models can be used, and an efficient computational algorithm can be developed. We analyze a dataset from the Acute Lymphocytic Leukemia (ALL) maintenance study conducted by Tata Medical Center, Kolkata. In this study, the patients suffering from ALL were treated with two standard drugs (6MP and MTx) for the first two years, and three biomarkers (e.g. lymphocyte count, neutrophil count and platelet count) were longitudinally measured. After treatment the patients were followed nearly for the next three years, and the relapse-time (if any) for each patient was recorded. For this dataset we develop a Bayesian quantile joint model for the three longitudinal biomarkers and time-to-relapse. We consider an Asymmetric Laplace Distribution (ALD) for each outcome, and exploit the mixture representation of the ALD for developing a Gibbs sampler algorithm to estimate the regression coefficients. Our proposed model allows different quantile levels for different biomarkers, but still simultaneously estimates the regression coefficients corresponding to a particular quantile combination. We infer that a higher lymphocyte count accelerates the chance of a relapse while a higher neutrophil count and a higher platelet count (jointly) reduce it. Also, we infer that across (almost) all quantiles 6MP reduces the lymphocyte count, while MTx increases the neutrophil count. Simulation studies are performed to assess the effectiveness of the proposed approach.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139998108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-03-21DOI: 10.1007/s10985-024-09623-0
Robert L Strawderman, Benjamin R Baer
This paper reconsiders several results of historical and current importance to nonparametric estimation of the survival distribution for failure in the presence of right-censored observation times, demonstrating in particular how Volterra integral equations help inter-connect the resulting estimators. The paper begins by considering Efron's self-consistency equation, introduced in a seminal 1967 Berkeley symposium paper. Novel insights provided in the current work include the observations that (i) the self-consistency equation leads directly to an anticipating Volterra integral equation whose solution is given by a product-limit estimator for the censoring survival function; (ii) a definition used in this argument immediately establishes the familiar product-limit estimator for the failure survival function; (iii) the usual Volterra integral equation for the product-limit estimator of the failure survival function leads to an immediate and simple proof that it can be represented as an inverse probability of censoring weighted estimator; (iv) a simple identity characterizes the relationship between natural inverse probability of censoring weighted estimators for the survival and distribution functions of failure; (v) the resulting inverse probability of censoring weighted estimators, attributed to a highly influential 1992 paper of Robins and Rotnitzky, were implicitly introduced in Efron's 1967 paper in its development of the redistribution-to-the-right algorithm. All results developed herein allow for ties between failure and/or censored observations.
{"title":"On the role of Volterra integral equations in self-consistent, product-limit, inverse probability of censoring weighted, and redistribution-to-the-right estimators for the survival function.","authors":"Robert L Strawderman, Benjamin R Baer","doi":"10.1007/s10985-024-09623-0","DOIUrl":"10.1007/s10985-024-09623-0","url":null,"abstract":"<p><p>This paper reconsiders several results of historical and current importance to nonparametric estimation of the survival distribution for failure in the presence of right-censored observation times, demonstrating in particular how Volterra integral equations help inter-connect the resulting estimators. The paper begins by considering Efron's self-consistency equation, introduced in a seminal 1967 Berkeley symposium paper. Novel insights provided in the current work include the observations that (i) the self-consistency equation leads directly to an anticipating Volterra integral equation whose solution is given by a product-limit estimator for the censoring survival function; (ii) a definition used in this argument immediately establishes the familiar product-limit estimator for the failure survival function; (iii) the usual Volterra integral equation for the product-limit estimator of the failure survival function leads to an immediate and simple proof that it can be represented as an inverse probability of censoring weighted estimator; (iv) a simple identity characterizes the relationship between natural inverse probability of censoring weighted estimators for the survival and distribution functions of failure; (v) the resulting inverse probability of censoring weighted estimators, attributed to a highly influential 1992 paper of Robins and Rotnitzky, were implicitly introduced in Efron's 1967 paper in its development of the redistribution-to-the-right algorithm. All results developed herein allow for ties between failure and/or censored observations.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140186140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-05-28DOI: 10.1007/s10985-024-09626-x
Jiayin Zheng, Li Hsu
Risk stratification based on prediction models has become increasingly important in preventing and managing chronic diseases. However, due to cost- and time-limitations, not every population can have resources for collecting enough detailed individual-level information on a large number of people to develop risk prediction models. A more practical approach is to use prediction models developed from existing studies and calibrate them with relevant summary-level information of the target population. Many existing studies were conducted under the population-based case-control design. Gail et al. (J Natl Cancer Inst 81:1879-1886, 1989) proposed to combine the odds ratio estimates obtained from case-control data and the disease incidence rates from the target population to obtain the baseline hazard function, and thereby the pure risk for developing diseases. However, the approach requires the risk factor distribution of cases from the case-control studies be same as the target population, which, if violated, may yield biased risk estimation. In this article, we propose two novel weighted estimating equation approaches to calibrate the baseline risk by leveraging the summary information of (some) risk factors in addition to disease-free probabilities from the targeted population. We establish the consistency and asymptotic normality of the proposed estimators. Extensive simulation studies and an application to colorectal cancer studies demonstrate the proposed estimators perform well for bias reduction in finite samples.
在预防和管理慢性疾病方面,基于预测模型的风险分层变得越来越重要。然而,由于成本和时间的限制,并非每个人群都有资源收集足够详细的大量个体信息来开发风险预测模型。更实用的方法是利用现有研究开发的预测模型,并用目标人群的相关汇总信息对其进行校准。现有的许多研究都是在基于人群的病例对照设计下进行的。Gail 等人(J Natl Cancer Inst 81:1879-1886,1989 年)建议把从病例对照数据中得到的几率估计值与目标人群的疾病发病率结合起来,以得到基线危险函数,从而得到纯粹的患病风险。然而,该方法要求病例对照研究中病例的危险因素分布与目标人群相同,如果违反了这一要求,可能会导致风险估计出现偏差。在本文中,我们提出了两种新的加权估计方程方法,除了利用目标人群的无病概率外,还利用(部分)风险因素的汇总信息来校准基线风险。我们确定了所提估计方程的一致性和渐近正态性。广泛的模拟研究和对结直肠癌研究的应用表明,所提出的估计器在有限样本中减少偏差方面表现良好。
{"title":"Risk projection for time-to-event outcome from population-based case-control studies leveraging summary statistics from the target population.","authors":"Jiayin Zheng, Li Hsu","doi":"10.1007/s10985-024-09626-x","DOIUrl":"10.1007/s10985-024-09626-x","url":null,"abstract":"<p><p>Risk stratification based on prediction models has become increasingly important in preventing and managing chronic diseases. However, due to cost- and time-limitations, not every population can have resources for collecting enough detailed individual-level information on a large number of people to develop risk prediction models. A more practical approach is to use prediction models developed from existing studies and calibrate them with relevant summary-level information of the target population. Many existing studies were conducted under the population-based case-control design. Gail et al. (J Natl Cancer Inst 81:1879-1886, 1989) proposed to combine the odds ratio estimates obtained from case-control data and the disease incidence rates from the target population to obtain the baseline hazard function, and thereby the pure risk for developing diseases. However, the approach requires the risk factor distribution of cases from the case-control studies be same as the target population, which, if violated, may yield biased risk estimation. In this article, we propose two novel weighted estimating equation approaches to calibrate the baseline risk by leveraging the summary information of (some) risk factors in addition to disease-free probabilities from the targeted population. We establish the consistency and asymptotic normality of the proposed estimators. Extensive simulation studies and an application to colorectal cancer studies demonstrate the proposed estimators perform well for bias reduction in finite samples.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11283322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141158740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-05-28DOI: 10.1007/s10985-024-09627-w
Anindya Bhadra, Rubin Wei, Ruth Keogh, Victor Kipnis, Douglas Midthune, Dennis W Buckman, Ya Su, Ananya Roy Chowdhury, Raymond J Carroll
We consider measurement error models for two variables observed repeatedly and subject to measurement error. One variable is continuous, while the other variable is a mixture of continuous and zero measurements. This second variable has two sources of zeros. The first source is episodic zeros, wherein some of the measurements for an individual may be zero and others positive. The second source is hard zeros, i.e., some individuals will always report zero. An example is the consumption of alcohol from alcoholic beverages: some individuals consume alcoholic beverages episodically, while others never consume alcoholic beverages. However, with a small number of repeat measurements from individuals, it is not possible to determine those who are episodic zeros and those who are hard zeros. We develop a new measurement error model for this problem, and use Bayesian methods to fit it. Simulations and data analyses are used to illustrate our methods. Extensions to parametric models and survival analysis are discussed briefly.
{"title":"Measurement error models with zero inflation and multiple sources of zeros, with applications to hard zeros.","authors":"Anindya Bhadra, Rubin Wei, Ruth Keogh, Victor Kipnis, Douglas Midthune, Dennis W Buckman, Ya Su, Ananya Roy Chowdhury, Raymond J Carroll","doi":"10.1007/s10985-024-09627-w","DOIUrl":"10.1007/s10985-024-09627-w","url":null,"abstract":"<p><p>We consider measurement error models for two variables observed repeatedly and subject to measurement error. One variable is continuous, while the other variable is a mixture of continuous and zero measurements. This second variable has two sources of zeros. The first source is episodic zeros, wherein some of the measurements for an individual may be zero and others positive. The second source is hard zeros, i.e., some individuals will always report zero. An example is the consumption of alcohol from alcoholic beverages: some individuals consume alcoholic beverages episodically, while others never consume alcoholic beverages. However, with a small number of repeat measurements from individuals, it is not possible to determine those who are episodic zeros and those who are hard zeros. We develop a new measurement error model for this problem, and use Bayesian methods to fit it. Simulations and data analyses are used to illustrate our methods. Extensions to parametric models and survival analysis are discussed briefly.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141162786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-20DOI: 10.1007/s10985-024-09625-y
Mingyue Du, Xiyuan Gao, Ling Chen
Doubly censored failure time data occur in many areas and for the situation, the failure time of interest usually represents the elapsed time between two related events such as an infection and the resulting disease onset. Although many methods have been proposed for regression analysis of such data, most of them are conditional on the occurrence time of the initial event and ignore the relationship between the two events or the ancillary information contained in the initial event. Corresponding to this, a new sieve maximum likelihood approach is proposed that makes use of the ancillary information, and in the method, the logistic model and Cox proportional hazards model are employed to model the initial event and the failure time of interest, respectively. A simulation study is conducted and suggests that the proposed method works well in practice and is more efficient than the existing methods as expected. The approach is applied to an AIDS study that motivated this investigation.
{"title":"Regression analysis of doubly censored failure time data with ancillary information","authors":"Mingyue Du, Xiyuan Gao, Ling Chen","doi":"10.1007/s10985-024-09625-y","DOIUrl":"https://doi.org/10.1007/s10985-024-09625-y","url":null,"abstract":"<p>Doubly censored failure time data occur in many areas and for the situation, the failure time of interest usually represents the elapsed time between two related events such as an infection and the resulting disease onset. Although many methods have been proposed for regression analysis of such data, most of them are conditional on the occurrence time of the initial event and ignore the relationship between the two events or the ancillary information contained in the initial event. Corresponding to this, a new sieve maximum likelihood approach is proposed that makes use of the ancillary information, and in the method, the logistic model and Cox proportional hazards model are employed to model the initial event and the failure time of interest, respectively. A simulation study is conducted and suggests that the proposed method works well in practice and is more efficient than the existing methods as expected. The approach is applied to an AIDS study that motivated this investigation.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140625569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1007/s10985-024-09624-z
Myeonggyun Lee, Andrea B. Troxel, Mengling Liu
In studies with time-to-event outcomes, multiple, inter-correlated, and time-varying covariates are commonly observed. It is of great interest to model their joint effects by allowing a flexible functional form and to delineate their relative contributions to survival risk. A class of semiparametric transformation (ST) models offers flexible specifications of the intensity function and can be a general framework to accommodate nonlinear covariate effects. In this paper, we propose a partial-linear single-index (PLSI) transformation model that reduces the dimensionality of multiple covariates into a single index and provides interpretable estimates of the covariate effects. We develop an iterative algorithm using the regression spline technique to model the nonparametric single-index function for possibly nonlinear joint effects, followed by nonparametric maximum likelihood estimation. We also propose a nonparametric testing procedure to formally examine the linearity of covariate effects. We conduct Monte Carlo simulation studies to compare the PLSI transformation model with the standard ST model and apply it to NYU Langone Health de-identified electronic health record data on COVID-19 hospitalized patients’ mortality and a Veteran’s Administration lung cancer trial.
在时间到事件结果的研究中,通常会观察到多个相互关联且随时间变化的协变量。通过灵活的函数形式对它们的联合效应进行建模,并确定它们对生存风险的相对贡献是非常有意义的。半参数变换(ST)模型提供了灵活的强度函数规格,可以作为一个通用框架来适应非线性协变量效应。在本文中,我们提出了一种部分线性单指数(PLSI)转换模型,该模型可将多个协变量的维度降低为单个指数,并提供可解释的协变量效应估计值。我们利用回归样条技术开发了一种迭代算法,为可能的非线性联合效应建立非参数单指数函数模型,然后进行非参数最大似然估计。我们还提出了一种非参数检验程序,用于正式检验协变量效应的线性度。我们进行了蒙特卡罗模拟研究,将 PLSI 转换模型与标准 ST 模型进行比较,并将其应用于纽约大学朗贡卫生院关于 COVID-19 住院患者死亡率的去标识化电子健康记录数据和退伍军人管理局肺癌试验。
{"title":"Partial-linear single-index transformation models with censored data","authors":"Myeonggyun Lee, Andrea B. Troxel, Mengling Liu","doi":"10.1007/s10985-024-09624-z","DOIUrl":"https://doi.org/10.1007/s10985-024-09624-z","url":null,"abstract":"<p>In studies with time-to-event outcomes, multiple, inter-correlated, and time-varying covariates are commonly observed. It is of great interest to model their joint effects by allowing a flexible functional form and to delineate their relative contributions to survival risk. A class of semiparametric transformation (ST) models offers flexible specifications of the intensity function and can be a general framework to accommodate nonlinear covariate effects. In this paper, we propose a partial-linear single-index (PLSI) transformation model that reduces the dimensionality of multiple covariates into a single index and provides interpretable estimates of the covariate effects. We develop an iterative algorithm using the regression spline technique to model the nonparametric single-index function for possibly nonlinear joint effects, followed by nonparametric maximum likelihood estimation. We also propose a nonparametric testing procedure to formally examine the linearity of covariate effects. We conduct Monte Carlo simulation studies to compare the PLSI transformation model with the standard ST model and apply it to NYU Langone Health de-identified electronic health record data on COVID-19 hospitalized patients’ mortality and a Veteran’s Administration lung cancer trial.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140574258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-02DOI: 10.1007/s10985-024-09621-2
Abstract
The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of estimates of Cox model log-relative hazards, and there has been some work estimating pure risk. Yet there are few examples of these options in the medical literature, and we could not find programs currently online to analyze these various options. We therefore present a unified approach and R software to facilitate such analyses. We used influence functions adapted to the various design and analysis options together with variance calculations that take the two-phase sampling into account. This work clarifies when the widely used “robust” variance estimate of Barlow (Biometrics 50:1064–1072, 1994) is appropriate. The corresponding R software, CaseCohortCoxSurvival, facilitates analysis with and without stratification and/or weight calibration, for subcohort sampling with or without replacement. We also allow for phase-two data to be missing at random for stratified designs. We provide inference not only for log-relative hazards in the Cox model, but also for cumulative baseline hazards and covariate-specific pure risks. We hope these calculations and software will promote wider use of more efficient and principled design and analysis options for case-cohort studies.
摘要 病例队列设计只获得病例和整个队列的随机样本(子队列)的完整协变量数据。随后发表的文章介绍了如何使用分层和权重校准来提高 Cox 模型对数相关危险度估计的效率,并对纯风险进行了一些估计。然而,这些方案在医学文献中鲜有实例,我们目前也无法在网上找到分析这些不同方案的程序。因此,我们提出了一种统一的方法和 R 软件,以方便进行此类分析。我们使用了与各种设计和分析方案相适应的影响函数以及考虑到两阶段采样的方差计算。这项工作明确了巴洛(Barlow,《生物统计学》50:1064-1072,1994 年)广泛使用的 "稳健 "方差估计何时合适。相应的 R 软件 CaseCohortCoxSurvival 可以在分层和/或权重校准的情况下进行分析,也可以在有或没有替换的子队列抽样中进行分析。对于分层设计,我们还允许第二阶段数据随机缺失。我们不仅提供了 Cox 模型中的对数相对危险度推断,还提供了累积基线危险度和共变量特异性纯危险度推断。我们希望这些计算和软件能促进病例队列研究更广泛地使用更高效、更有原则的设计和分析方案。
{"title":"Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data","authors":"","doi":"10.1007/s10985-024-09621-2","DOIUrl":"https://doi.org/10.1007/s10985-024-09621-2","url":null,"abstract":"<h3>Abstract</h3> <p>The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of estimates of Cox model log-relative hazards, and there has been some work estimating pure risk. Yet there are few examples of these options in the medical literature, and we could not find programs currently online to analyze these various options. We therefore present a unified approach and R software to facilitate such analyses. We used influence functions adapted to the various design and analysis options together with variance calculations that take the two-phase sampling into account. This work clarifies when the widely used “robust” variance estimate of Barlow (Biometrics 50:1064–1072, 1994) is appropriate. The corresponding R software, CaseCohortCoxSurvival, facilitates analysis with and without stratification and/or weight calibration, for subcohort sampling with or without replacement. We also allow for phase-two data to be missing at random for stratified designs. We provide inference not only for log-relative hazards in the Cox model, but also for cumulative baseline hazards and covariate-specific pure risks. We hope these calculations and software will promote wider use of more efficient and principled design and analysis options for case-cohort studies.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140574106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}