{"title":"海量数据量化回归的分布式最优子采样","authors":"Yue Chao, Xuejun Ma, Boya Zhu","doi":"10.1016/j.jspi.2024.106186","DOIUrl":null,"url":null,"abstract":"<div><p>Methods for reducing distributed subsample sizes have increasingly become popular statistical problems in the big data era. Existing works of optimal subsample selection on the massive linear and generalized linear models with distributed data sources have been solidly investigated and widely applied. Nevertheless, few studies have developed distributed optimal subsample selection procedures for quantile regression in massive data. In such settings, the distributed optimal subsampling probabilities and subset sizes selection criteria need to be established simultaneously. In this work, we propose a distributed subsampling technique for the quantile regression models. The estimation approach is based on a two-step algorithm for the distributed subsampling procedures. Furthermore, the theoretical results, such as consistency and asymptotic normality of resultant estimators, are rigorously established under some regularity conditions. The empirical evaluation and performance of the proposed subsampling method are conducted in simulation experiments and real data applications.</p></div>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distributed optimal subsampling for quantile regression with massive data\",\"authors\":\"Yue Chao, Xuejun Ma, Boya Zhu\",\"doi\":\"10.1016/j.jspi.2024.106186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Methods for reducing distributed subsample sizes have increasingly become popular statistical problems in the big data era. Existing works of optimal subsample selection on the massive linear and generalized linear models with distributed data sources have been solidly investigated and widely applied. Nevertheless, few studies have developed distributed optimal subsample selection procedures for quantile regression in massive data. In such settings, the distributed optimal subsampling probabilities and subset sizes selection criteria need to be established simultaneously. In this work, we propose a distributed subsampling technique for the quantile regression models. The estimation approach is based on a two-step algorithm for the distributed subsampling procedures. Furthermore, the theoretical results, such as consistency and asymptotic normality of resultant estimators, are rigorously established under some regularity conditions. The empirical evaluation and performance of the proposed subsampling method are conducted in simulation experiments and real data applications.</p></div>\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2024-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0378375824000430\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378375824000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distributed optimal subsampling for quantile regression with massive data
Methods for reducing distributed subsample sizes have increasingly become popular statistical problems in the big data era. Existing works of optimal subsample selection on the massive linear and generalized linear models with distributed data sources have been solidly investigated and widely applied. Nevertheless, few studies have developed distributed optimal subsample selection procedures for quantile regression in massive data. In such settings, the distributed optimal subsampling probabilities and subset sizes selection criteria need to be established simultaneously. In this work, we propose a distributed subsampling technique for the quantile regression models. The estimation approach is based on a two-step algorithm for the distributed subsampling procedures. Furthermore, the theoretical results, such as consistency and asymptotic normality of resultant estimators, are rigorously established under some regularity conditions. The empirical evaluation and performance of the proposed subsampling method are conducted in simulation experiments and real data applications.