{"title":"大规模最小二乘回归的现代子抽样方法","authors":"Tao Li, Cheng Meng","doi":"10.4018/IJCPS.2020070101","DOIUrl":null,"url":null,"abstract":"Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model prediction. This review presents some cutting-edge subsampling methods based on the large-scale least squares estimation. Two major families of subsampling methods are introduced: the randomized subsampling approach and the optimal subsampling approach. The former aims to develop a more effective data-dependent sampling probability while the latter aims to select a deterministic subsample in accordance with certain optimality criteria. Real data examples are provided to compare these methods empirically, respecting both the estimation accuracy and the computing time.","PeriodicalId":198135,"journal":{"name":"Int. J. Cyber Phys. Syst.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Modern Subsampling Methods for Large-Scale Least Squares Regression\",\"authors\":\"Tao Li, Cheng Meng\",\"doi\":\"10.4018/IJCPS.2020070101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model prediction. This review presents some cutting-edge subsampling methods based on the large-scale least squares estimation. Two major families of subsampling methods are introduced: the randomized subsampling approach and the optimal subsampling approach. The former aims to develop a more effective data-dependent sampling probability while the latter aims to select a deterministic subsample in accordance with certain optimality criteria. Real data examples are provided to compare these methods empirically, respecting both the estimation accuracy and the computing time.\",\"PeriodicalId\":198135,\"journal\":{\"name\":\"Int. J. Cyber Phys. Syst.\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Cyber Phys. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/IJCPS.2020070101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Cyber Phys. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJCPS.2020070101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Modern Subsampling Methods for Large-Scale Least Squares Regression
Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model prediction. This review presents some cutting-edge subsampling methods based on the large-scale least squares estimation. Two major families of subsampling methods are introduced: the randomized subsampling approach and the optimal subsampling approach. The former aims to develop a more effective data-dependent sampling probability while the latter aims to select a deterministic subsample in accordance with certain optimality criteria. Real data examples are provided to compare these methods empirically, respecting both the estimation accuracy and the computing time.