I present some personal memories and thoughts on Cox's 1972 paper "Regression Models and Life-Tables".
I present some personal memories and thoughts on Cox's 1972 paper "Regression Models and Life-Tables".
Clustered and multivariate failure time data are commonly encountered in biomedical studies and a marginal regression approach is often employed to identify the potential risk factors of a failure. We consider a semiparametric marginal Cox proportional hazards model for right-censored survival data with potential correlation. We propose to use a quadratic inference function method based on the generalized method of moments to obtain the optimal hazard ratio estimators. The inverse of the working correlation matrix is represented by the linear combination of basis matrices in the context of the estimating equation. We investigate the asymptotic properties of the regression estimators from the proposed method. The optimality of the hazard ratio estimators is discussed. Our simulation study shows that the estimator from the quadratic inference approach is more efficient than those from existing estimating equation methods whether the working correlation structure is correctly specified or not. Finally, we apply the model and the proposed estimation method to analyze a study of tooth loss and have uncovered new insights that were previously inaccessible using existing methods.
The Nun study is a well-known longitudinal epidemiology study of aging and dementia that recruited elderly nuns who were not yet diagnosed with dementia (i.e., incident cohort) and who had dementia prior to entry (i.e., prevalent cohort). In such a natural history of disease study, multistate modeling of the combined data from both incident and prevalent cohorts is desirable to improve the efficiency of inference. While important, the multistate modeling approaches for the combined data have been scarcely used in practice because prevalent samples do not provide the exact date of disease onset and do not represent the target population due to left-truncation. In this paper, we demonstrate how to adequately combine both incident and prevalent cohorts to examine risk factors for every possible transition in studying the natural history of dementia. We adapt a four-state nonhomogeneous Markov model to characterize all transitions between different clinical stages, including plausible reversible transitions. The estimating procedure using the combined data leads to efficiency gains for every transition compared to those from the incident cohort data only.
The Kaplan-Meier estimator is ubiquitously used to estimate survival probabilities for time-to-event data. It is nonparametric, and thus does not require specification of a survival distribution, but it does assume that the risk set at any time t consists of independent observations. This assumption does not hold for data from paired organ systems such as occur in ophthalmology (eyes) or otolaryngology (ears), or for other types of clustered data. In this article, we estimate marginal survival probabilities in the setting of clustered data, and provide confidence limits for these estimates with intra-cluster correlation accounted for by an interval-censored version of the Clayton-Oakes model. We develop a goodness-of-fit test for general bivariate interval-censored data and apply it to the proposed interval-censored version of the Clayton-Oakes model. We also propose a likelihood ratio test for the comparison of survival distributions between two groups in the setting of clustered data under the assumption of a constant between-group hazard ratio. This methodology can be used both for balanced and unbalanced cluster sizes, and also when the cluster size is informative. We compare our test to the ordinary log rank test and the Lin-Wei (LW) test based on the marginal Cox proportional Hazards model with robust standard errors obtained from the sandwich estimator. Simulation results indicate that the ordinary log rank test over-inflates type I error, while the proposed unconditional likelihood ratio test has appropriate type I error and higher power than the LW test. The method is demonstrated in real examples from the Sorbinil Retinopathy Trial, and the Age-Related Macular Degeneration Study. Raw data from these two trials are provided.
We consider a novel class of semiparametric joint models for multivariate longitudinal and survival data with dependent censoring. In these models, unknown-fashion cumulative baseline hazard functions are fitted by a novel class of penalized-splines (P-splines) with linear constraints. The dependence between the failure time of interest and censoring time is accommodated by a normal transformation model, where both nonparametric marginal survival function and censoring function are transformed to standard normal random variables with bivariate normal joint distribution. Based on a hybrid algorithm together with the Metropolis-Hastings algorithm within the Gibbs sampler, we propose a feasible Bayesian method to simultaneously estimate unknown parameters of interest, and to fit baseline survival and censoring functions. Intensive simulation studies are conducted to assess the performance of the proposed method. The use of the proposed method is also illustrated in the analysis of a data set from the International Breast Cancer Study Group.
This paper discusses nonparametric identification and estimation of the causal effect of a treatment in the presence of confounding, competing risks and random right-censoring. Our identification strategy is based on an instrumental variable. We show that the competing risks model generates a nonparametric quantile instrumental regression problem. Quantile treatment effects on the subdistribution function can be recovered from the regression function. A distinguishing feature of the model is that censoring and competing risks prevent identification at some quantiles. We characterize the set of quantiles for which exact identification is possible and give partial identification results for other quantiles. We outline an estimation procedure and discuss its properties. The finite sample performance of the estimator is evaluated through simulations. We apply the proposed method to the Health Insurance Plan of Greater New York experiment.
In modern biomedical datasets, it is common for recurrent outcomes data to be collected in an incomplete manner. More specifically, information on recurrent events is routinely recorded as a mixture of recurrent event data, panel count data, and panel binary data; we refer to this structure as general mixed recurrent event data. Although the aforementioned data types are individually well-studied, there does not appear to exist an established approach for regression analysis of the three component combination. Often, ad-hoc measures such as imputation or discarding of data are used to homogenize records prior to the analysis, but such measures lead to obvious concerns regarding robustness, loss of efficiency, and other issues. This work proposes a maximum likelihood regression estimation procedure for the combination of general mixed recurrent event data and establishes the asymptotic properties of the proposed estimators. In addition, we generalize the approach to allow for the existence of terminal events, a common complicating feature in recurrent event analysis. Numerical studies and application to the Childhood Cancer Survivor Study suggest that the proposed procedures work well in practical situations.
Despite the urgent need for an effective prediction model tailored to individual interests, existing models have mainly been developed for the mean outcome, targeting average people. Additionally, the direction and magnitude of covariates' effects on the mean outcome may not hold across different quantiles of the outcome distribution. To accommodate the heterogeneous characteristics of covariates and provide a flexible risk model, we propose a quantile forward regression model for high-dimensional survival data. Our method selects variables by maximizing the likelihood of the asymmetric Laplace distribution (ALD) and derives the final model based on the extended Bayesian Information Criterion (EBIC). We demonstrate that the proposed method enjoys a sure screening property and selection consistency. We apply it to the national health survey dataset to show the advantages of a quantile-specific prediction model. Finally, we discuss potential extensions of our approach, including the nonlinear model and the globally concerned quantile regression coefficients model.
The classical approach to analyze time-to-event data, e.g. in clinical trials, is to fit Kaplan-Meier curves yielding the treatment effect as the hazard ratio between treatment groups. Afterwards, a log-rank test is commonly performed to investigate whether there is a difference in survival or, depending on additional covariates, a Cox proportional hazard model is used. However, in numerous trials these approaches fail due to the presence of non-proportional hazards, resulting in difficulties of interpreting the hazard ratio and a loss of power. When considering equivalence or non-inferiority trials, the commonly performed log-rank based tests are similarly affected by a violation of this assumption. Here we propose a parametric framework to assess equivalence or non-inferiority for survival data. We derive pointwise confidence bands for both, the hazard ratio and the difference of the survival curves. Further we propose a test procedure addressing non-inferiority and equivalence by directly comparing the survival functions at certain time points or over an entire range of time. Once the model's suitability is proven the method provides a noticeable power benefit, irrespectively of the shape of the hazard ratio. On the other hand, model selection should be carried out carefully as misspecification may cause type I error inflation in some situations. We investigate the robustness and demonstrate the advantages and disadvantages of the proposed methods by means of a simulation study. Finally, we demonstrate the validity of the methods by a clinical trial example.
In studies of recurrent events, joint modeling approaches are often needed to allow for potential dependent censoring by a terminal event such as death. Joint frailty models for recurrent events and death with an additional dependence parameter have been studied for cases in which individuals are observed from the start of the event processes. However, samples are often selected at a later time, which results in delayed entry so that only individuals who have not yet experienced the terminal event will be included. In joint frailty models such left truncation has effects on the frailty distribution that need to be accounted for in both the recurrence process and the terminal event process, if the two are associated. We demonstrate, in a comprehensive simulation study, the effects that not adjusting for late entry can have and derive the correctly adjusted marginal likelihood, which can be expressed as a ratio of two integrals over the frailty distribution. We extend the estimation method of Liu and Huang (Stat Med 27:2665-2683, 2008. https://doi.org/10.1002/sim.3077 ) to include potential left truncation. Numerical integration is performed by Gaussian quadrature, the baseline intensities are specified as piecewise constant functions, potential covariates are assumed to have multiplicative effects on the intensities. We apply the method to estimate age-specific intensities of recurrent urinary tract infections and mortality in an older population.