Ying Sheng, Yifei Sun, C. E. Mcculloch, Chiung-Yu Huang
{"title":"能够容纳协变量的高速生存数据的可扩展估计","authors":"Ying Sheng, Yifei Sun, C. E. Mcculloch, Chiung-Yu Huang","doi":"10.5705/ss.202022.0028","DOIUrl":null,"url":null,"abstract":"Scalable Estimation for High Velocity Survival Data Able to Accommodate Addition of Covariates Abstract: With the rapidly increasing availability of large-scale streaming data, there has been a growing interest in developing methods that allow the processing of the data in batches without requiring storage of the full dataset. In this paper, we propose a hybrid likelihood approach for scalable estimation of the Cox model using individual-level data in the current data batch and summary statistics calculated from historical data. We show that the proposed scalable estimator is asymptotically as efficient as the maximum likelihood estimator calculated using the entire dataset with low data storage requirements and low loading and computation time. A challenge in analyzing survival data batches that is not accommodated in ex-tant methods is that new covariates may become available midway through data collection. To accommodate addition of covariates, we develop a hybrid empirical likelihood approach to incorporate the historical covariate effects evaluated in a reduced Cox model. The extended scalable estimator is asymptotically more efficient than the maximum likelihood estimator obtained using only the data batches that include the additional covariates. The proposed approaches are evaluated by numerical simulations and illustrated with an analysis of Surveillance, Epidemiology, and End Results breast data.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalable Estimation for High Velocity Survival Data Able to Accommodate Addition of Covariates\",\"authors\":\"Ying Sheng, Yifei Sun, C. E. Mcculloch, Chiung-Yu Huang\",\"doi\":\"10.5705/ss.202022.0028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scalable Estimation for High Velocity Survival Data Able to Accommodate Addition of Covariates Abstract: With the rapidly increasing availability of large-scale streaming data, there has been a growing interest in developing methods that allow the processing of the data in batches without requiring storage of the full dataset. In this paper, we propose a hybrid likelihood approach for scalable estimation of the Cox model using individual-level data in the current data batch and summary statistics calculated from historical data. We show that the proposed scalable estimator is asymptotically as efficient as the maximum likelihood estimator calculated using the entire dataset with low data storage requirements and low loading and computation time. A challenge in analyzing survival data batches that is not accommodated in ex-tant methods is that new covariates may become available midway through data collection. To accommodate addition of covariates, we develop a hybrid empirical likelihood approach to incorporate the historical covariate effects evaluated in a reduced Cox model. The extended scalable estimator is asymptotically more efficient than the maximum likelihood estimator obtained using only the data batches that include the additional covariates. The proposed approaches are evaluated by numerical simulations and illustrated with an analysis of Surveillance, Epidemiology, and End Results breast data.\",\"PeriodicalId\":49478,\"journal\":{\"name\":\"Statistica Sinica\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistica Sinica\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.5705/ss.202022.0028\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistica Sinica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.5705/ss.202022.0028","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Scalable Estimation for High Velocity Survival Data Able to Accommodate Addition of Covariates
Scalable Estimation for High Velocity Survival Data Able to Accommodate Addition of Covariates Abstract: With the rapidly increasing availability of large-scale streaming data, there has been a growing interest in developing methods that allow the processing of the data in batches without requiring storage of the full dataset. In this paper, we propose a hybrid likelihood approach for scalable estimation of the Cox model using individual-level data in the current data batch and summary statistics calculated from historical data. We show that the proposed scalable estimator is asymptotically as efficient as the maximum likelihood estimator calculated using the entire dataset with low data storage requirements and low loading and computation time. A challenge in analyzing survival data batches that is not accommodated in ex-tant methods is that new covariates may become available midway through data collection. To accommodate addition of covariates, we develop a hybrid empirical likelihood approach to incorporate the historical covariate effects evaluated in a reduced Cox model. The extended scalable estimator is asymptotically more efficient than the maximum likelihood estimator obtained using only the data batches that include the additional covariates. The proposed approaches are evaluated by numerical simulations and illustrated with an analysis of Surveillance, Epidemiology, and End Results breast data.
期刊介绍:
Statistica Sinica aims to meet the needs of statisticians in a rapidly changing world. It provides a forum for the publication of innovative work of high quality in all areas of statistics, including theory, methodology and applications. The journal encourages the development and principled use of statistical methodology that is relevant for society, science and technology.