{"title":"Optimal subsampling for the Cox proportional hazards model with massive survival data","authors":"Nan Qiao , Wangcheng Li , Feng Xiao , Cunjie Lin","doi":"10.1016/j.jspi.2023.106136","DOIUrl":null,"url":null,"abstract":"<div><p><span><span>Massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for </span>Cox proportional hazards model with time-dependent </span>covariates<span> when the sample size is extraordinarily large but the computing resources are relatively limited. A subsample estimator is developed by maximizing a weighted partial likelihood, and shown to have consistency and asymptotic normality<span>. By minimizing the asymptotic mean squared error of the subsample estimator, the optimal subsampling probabilities are formulated with explicit expression. Simulation studies show that the proposed method has satisfactory performances in approximating the full data estimator. The proposed method is applied to the corporate loan data and breast cancer data, with different censoring rates, and the outcome also confirms the practical advantages.</span></span></p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"231 ","pages":"Article 106136"},"PeriodicalIF":0.8000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Planning and Inference","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378375823001052","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for Cox proportional hazards model with time-dependent covariates when the sample size is extraordinarily large but the computing resources are relatively limited. A subsample estimator is developed by maximizing a weighted partial likelihood, and shown to have consistency and asymptotic normality. By minimizing the asymptotic mean squared error of the subsample estimator, the optimal subsampling probabilities are formulated with explicit expression. Simulation studies show that the proposed method has satisfactory performances in approximating the full data estimator. The proposed method is applied to the corporate loan data and breast cancer data, with different censoring rates, and the outcome also confirms the practical advantages.
期刊介绍:
The Journal of Statistical Planning and Inference offers itself as a multifaceted and all-inclusive bridge between classical aspects of statistics and probability, and the emerging interdisciplinary aspects that have a potential of revolutionizing the subject. While we maintain our traditional strength in statistical inference, design, classical probability, and large sample methods, we also have a far more inclusive and broadened scope to keep up with the new problems that confront us as statisticians, mathematicians, and scientists.
We publish high quality articles in all branches of statistics, probability, discrete mathematics, machine learning, and bioinformatics. We also especially welcome well written and up to date review articles on fundamental themes of statistics, probability, machine learning, and general biostatistics. Thoughtful letters to the editors, interesting problems in need of a solution, and short notes carrying an element of elegance or beauty are equally welcome.