Optimal subsampling algorithms for composite quantile regression in massive data

IF 1.2 4区 数学 Q2 STATISTICS & PROBABILITY Statistics Pub Date : 2023-07-04 DOI:10.1080/02331888.2023.2239507
Jun Jin, Shuangzhe Liu, Tiefeng Ma
{"title":"Optimal subsampling algorithms for composite quantile regression in massive data","authors":"Jun Jin, Shuangzhe Liu, Tiefeng Ma","doi":"10.1080/02331888.2023.2239507","DOIUrl":null,"url":null,"abstract":"Massive datasets have gained increasing prominence across various fields, but their analysis is often impeded by computational limitations. In response, Wang and Ma (Optimal subsampling for quantile regression in big data. Biometrika. 2021;108:99–112) have proposed an optimal subsampling method for quantile regression in massive datasets. Composite quantile regression, as a robust and efficient alternative to ordinary least squares regression and quantile regression in linear models, presents further complexities due to its distinct loss function. This paper extends the optimal subsampling method to accommodate composite quantile regression problems. We begin by deriving two new optimal subsampling probabilities for composite quantile regression, considering both the L- and A-optimality criteria Subsequently, we develop an adaptive two-step method based on these probabilities. The resulting estimators exhibit desirable asymptotic properties. In addition, to estimate the variance-covariance matrix without explicitly estimating the densities of the responses, we propose a combining subsamples method. Numerical studies on simulated and real data are conducted to assess and showcase the practical performance of our proposed methods.","PeriodicalId":54358,"journal":{"name":"Statistics","volume":"29 1","pages":"811 - 843"},"PeriodicalIF":1.2000,"publicationDate":"2023-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/02331888.2023.2239507","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Massive datasets have gained increasing prominence across various fields, but their analysis is often impeded by computational limitations. In response, Wang and Ma (Optimal subsampling for quantile regression in big data. Biometrika. 2021;108:99–112) have proposed an optimal subsampling method for quantile regression in massive datasets. Composite quantile regression, as a robust and efficient alternative to ordinary least squares regression and quantile regression in linear models, presents further complexities due to its distinct loss function. This paper extends the optimal subsampling method to accommodate composite quantile regression problems. We begin by deriving two new optimal subsampling probabilities for composite quantile regression, considering both the L- and A-optimality criteria Subsequently, we develop an adaptive two-step method based on these probabilities. The resulting estimators exhibit desirable asymptotic properties. In addition, to estimate the variance-covariance matrix without explicitly estimating the densities of the responses, we propose a combining subsamples method. Numerical studies on simulated and real data are conducted to assess and showcase the practical performance of our proposed methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
海量数据中复合分位数回归的最优次抽样算法
海量数据集已经在各个领域获得了越来越多的关注,但它们的分析往往受到计算限制的阻碍。对此,Wang和Ma(大数据中分位数回归的最优子抽样。Biometrika. 2021; 108:99-112)提出了一种用于大规模数据集分位数回归的最佳子抽样方法。复合分位数回归作为线性模型中普通最小二乘回归和分位数回归的一种鲁棒且高效的替代方法,由于其损失函数不同而呈现出进一步的复杂性。本文扩展了最优次抽样方法以适应复合分位数回归问题。我们首先推导了复合分位数回归的两个新的最优子抽样概率,同时考虑了L-和a -最优性准则,然后基于这些概率开发了一种自适应两步方法。所得到的估计量表现出理想的渐近性质。此外,为了在不显式估计响应密度的情况下估计方差-协方差矩阵,我们提出了组合子样本方法。通过模拟和实际数据的数值研究来评估和展示我们提出的方法的实际性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistics
Statistics 数学-统计学与概率论
CiteScore
1.00
自引率
0.00%
发文量
59
审稿时长
12 months
期刊介绍: Statistics publishes papers developing and analysing new methods for any active field of statistics, motivated by real-life problems. Papers submitted for consideration should provide interesting and novel contributions to statistical theory and its applications with rigorous mathematical results and proofs. Moreover, numerical simulations and application to real data sets can improve the quality of papers, and should be included where appropriate. Statistics does not publish papers which represent mere application of existing procedures to case studies, and papers are required to contain methodological or theoretical innovation. Topics of interest include, for example, nonparametric statistics, time series, analysis of topological or functional data. Furthermore the journal also welcomes submissions in the field of theoretical econometrics and its links to mathematical statistics.
期刊最新文献
Robust estimator of the ruin probability in infinite time for heavy-tailed distributions Gaussian modeling with B-splines for spatial functional data on irregular domains A note on the asymptotic behavior of a mildly unstable integer-valued AR(1) model Explainable machine learning for financial risk management: two practical use cases Online updating Huber robust regression for big data streams
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1