Pub Date : 2024-04-08DOI: 10.1007/s11222-024-10424-6
Xuejun Jiang, Yakun Liang, Haofeng Wang
Strong correlation among predictors and heavy-tailed noises pose a great challenge in the analysis of ultra-high dimensional data. Such challenge leads to an increase in the computation time for discovering active variables and a decrease in selection accuracy. To address this issue, we propose an innovative two-stage screen-then-select approach and its derivative procedure based on a robust quantile regression with sparsity assumption. This approach initially screens important features by ranking quantile ridge estimation and subsequently employs a likelihood-based post-screening selection strategy to refine variable selection. Additionally, we conduct an internal competition mechanism along the greedy search path to enhance the robustness of algorithm against the design dependence. Our methods are simple to implement and possess numerous desirable properties from theoretical and computational standpoints. Theoretically, we establish the strong consistency of feature selection for the proposed methods under some regularity conditions. In empirical studies, we assess the finite sample performance of our methods by comparing them with utility screening approaches and existing penalized quantile regression methods. Furthermore, we apply our methods to identify genes associated with anticancer drug sensitivities for practical guidance.
{"title":"Screen then select: a strategy for correlated predictors in high-dimensional quantile regression","authors":"Xuejun Jiang, Yakun Liang, Haofeng Wang","doi":"10.1007/s11222-024-10424-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10424-6","url":null,"abstract":"<p>Strong correlation among predictors and heavy-tailed noises pose a great challenge in the analysis of ultra-high dimensional data. Such challenge leads to an increase in the computation time for discovering active variables and a decrease in selection accuracy. To address this issue, we propose an innovative two-stage screen-then-select approach and its derivative procedure based on a robust quantile regression with sparsity assumption. This approach initially screens important features by ranking quantile ridge estimation and subsequently employs a likelihood-based post-screening selection strategy to refine variable selection. Additionally, we conduct an internal competition mechanism along the greedy search path to enhance the robustness of algorithm against the design dependence. Our methods are simple to implement and possess numerous desirable properties from theoretical and computational standpoints. Theoretically, we establish the strong consistency of feature selection for the proposed methods under some regularity conditions. In empirical studies, we assess the finite sample performance of our methods by comparing them with utility screening approaches and existing penalized quantile regression methods. Furthermore, we apply our methods to identify genes associated with anticancer drug sensitivities for practical guidance.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-06DOI: 10.1007/s11222-024-10422-8
Bao Anh Vu, David Gunawan, Andrew Zammit-Mangion
Models with random effects, such as generalised linear mixed models (GLMMs), are often used for analysing clustered data. Parameter inference with these models is difficult because of the presence of cluster-specific random effects, which must be integrated out when evaluating the likelihood function. Here, we propose a sequential variational Bayes algorithm, called Recursive Variational Gaussian Approximation for Latent variable models (R-VGAL), for estimating parameters in GLMMs. The R-VGAL algorithm operates on the data sequentially, requires only a single pass through the data, and can provide parameter updates as new data are collected without the need of re-processing the previous data. At each update, the R-VGAL algorithm requires the gradient and Hessian of a “partial” log-likelihood function evaluated at the new observation, which are generally not available in closed form for GLMMs. To circumvent this issue, we propose using an importance-sampling-based approach for estimating the gradient and Hessian via Fisher’s and Louis’ identities. We find that R-VGAL can be unstable when traversing the first few data points, but that this issue can be mitigated by introducing a damping factor in the initial steps of the algorithm. Through illustrations on both simulated and real datasets, we show that R-VGAL provides good approximations to posterior distributions, that it can be made robust through damping, and that it is computationally efficient.
{"title":"R-VGAL: a sequential variational Bayes algorithm for generalised linear mixed models","authors":"Bao Anh Vu, David Gunawan, Andrew Zammit-Mangion","doi":"10.1007/s11222-024-10422-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10422-8","url":null,"abstract":"<p>Models with random effects, such as generalised linear mixed models (GLMMs), are often used for analysing clustered data. Parameter inference with these models is difficult because of the presence of cluster-specific random effects, which must be integrated out when evaluating the likelihood function. Here, we propose a sequential variational Bayes algorithm, called Recursive Variational Gaussian Approximation for Latent variable models (R-VGAL), for estimating parameters in GLMMs. The R-VGAL algorithm operates on the data sequentially, requires only a single pass through the data, and can provide parameter updates as new data are collected without the need of re-processing the previous data. At each update, the R-VGAL algorithm requires the gradient and Hessian of a “partial” log-likelihood function evaluated at the new observation, which are generally not available in closed form for GLMMs. To circumvent this issue, we propose using an importance-sampling-based approach for estimating the gradient and Hessian via Fisher’s and Louis’ identities. We find that R-VGAL can be unstable when traversing the first few data points, but that this issue can be mitigated by introducing a damping factor in the initial steps of the algorithm. Through illustrations on both simulated and real datasets, we show that R-VGAL provides good approximations to posterior distributions, that it can be made robust through damping, and that it is computationally efficient.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-05DOI: 10.1007/s11222-024-10425-5
Jonathan James
Adaptive rejection sampling requires that users provide points that span the distribution’s mode. If these points are far from the mode, it significantly increases computational costs. This paper introduces a simple, automated approach for selecting initial points that uses numerical optimization to quickly bracket the mode. When an initial point is given that resides in a high-density area, the method often requires just four function evaluations to draw a sample—just one more than the sampler’s minimum. This feature makes it well-suited for Gibbs sampling, where the previous round’s draw can serve as the starting point.
{"title":"Automated generation of initial points for adaptive rejection sampling of log-concave distributions","authors":"Jonathan James","doi":"10.1007/s11222-024-10425-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10425-5","url":null,"abstract":"<p>Adaptive rejection sampling requires that users provide points that span the distribution’s mode. If these points are far from the mode, it significantly increases computational costs. This paper introduces a simple, automated approach for selecting initial points that uses numerical optimization to quickly bracket the mode. When an initial point is given that resides in a high-density area, the method often requires just four function evaluations to draw a sample—just one more than the sampler’s minimum. This feature makes it well-suited for Gibbs sampling, where the previous round’s draw can serve as the starting point.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01DOI: 10.1007/s11222-024-10405-9
Carlo Cavicchia, Maurizio Vichi, Giorgia Zaccaria
Gaussian mixture models represent a conceptually and mathematically elegant class of models for casting the density of a heterogeneous population where the observed data is collected from a population composed of a finite set of G homogeneous subpopulations with a Gaussian distribution. A limitation of these models is that they suffer from the curse of dimensionality, and the number of parameters becomes easily extremely large in the presence of high-dimensional data. In this paper, we propose a class of parsimonious Gaussian mixture models with constrained extended ultrametric covariance structures that are capable of exploring hierarchical relations among variables. The proposal shows to require a reduced number of parameters to be fit and includes constrained covariance structures across and within components that further reduce the number of parameters of the model.
高斯混合模型是一类概念上和数学上都很优雅的模型,用于计算异质种群的密度,其中观测数据是从由具有高斯分布的 G 个同质子种群的有限集合组成的种群中收集的。这些模型的局限性在于它们受到维度诅咒的影响,在存在高维数据的情况下,参数数量很容易变得极其庞大。在本文中,我们提出了一类具有受限扩展超对称协方差结构的简约高斯混合物模型,这些模型能够探索变量之间的层次关系。该建议表明,拟合所需的参数数量减少了,并且包括跨成分和成分内部的约束协方差结构,从而进一步减少了模型的参数数量。
{"title":"Parsimonious ultrametric Gaussian mixture models","authors":"Carlo Cavicchia, Maurizio Vichi, Giorgia Zaccaria","doi":"10.1007/s11222-024-10405-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10405-9","url":null,"abstract":"<p>Gaussian mixture models represent a conceptually and mathematically elegant class of models for casting the density of a heterogeneous population where the observed data is collected from a population composed of a finite set of <i>G</i> homogeneous subpopulations with a Gaussian distribution. A limitation of these models is that they suffer from the curse of dimensionality, and the number of parameters becomes easily extremely large in the presence of high-dimensional data. In this paper, we propose a class of parsimonious Gaussian mixture models with constrained extended ultrametric covariance structures that are capable of exploring hierarchical relations among variables. The proposal shows to require a reduced number of parameters to be fit and includes constrained covariance structures across and within components that further reduce the number of parameters of the model.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the training process of machine learning, the minimization of the empirical risk loss function is often used to measure the difference between the model’s predicted value and the real value. Stochastic gradient descent is very popular for this type of optimization problem, but converges slowly in theoretical analysis. To solve this problem, there are already many algorithms with variance reduction techniques, such as SVRG, SAG, SAGA, etc. Some scholars apply the conjugate gradient method in traditional optimization to these algorithms, such as CGVR, SCGA, SCGN, etc., which can basically achieve linear convergence speed, but these conclusions often need to be established under some relatively strong assumptions. In traditional optimization, the conjugate gradient method often requires the use of line search techniques to achieve good experimental results. In a sense, line search embodies some properties of the conjugate methods. Taking inspiration from this, we apply the modified three-term conjugate gradient method and line search technique to machine learning. In our theoretical analysis, we obtain the same convergence rate as SCGA under weaker conditional assumptions. We also test the convergence of our algorithm using two non-convex machine learning models.
{"title":"Stochastic three-term conjugate gradient method with variance technique for non-convex learning","authors":"Chen Ouyang, Chenkaixiang Lu, Xiong Zhao, Ruping Huang, Gonglin Yuan, Yiyan Jiang","doi":"10.1007/s11222-024-10409-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10409-5","url":null,"abstract":"<p>In the training process of machine learning, the minimization of the empirical risk loss function is often used to measure the difference between the model’s predicted value and the real value. Stochastic gradient descent is very popular for this type of optimization problem, but converges slowly in theoretical analysis. To solve this problem, there are already many algorithms with variance reduction techniques, such as SVRG, SAG, SAGA, etc. Some scholars apply the conjugate gradient method in traditional optimization to these algorithms, such as CGVR, SCGA, SCGN, etc., which can basically achieve linear convergence speed, but these conclusions often need to be established under some relatively strong assumptions. In traditional optimization, the conjugate gradient method often requires the use of line search techniques to achieve good experimental results. In a sense, line search embodies some properties of the conjugate methods. Taking inspiration from this, we apply the modified three-term conjugate gradient method and line search technique to machine learning. In our theoretical analysis, we obtain the same convergence rate as SCGA under weaker conditional assumptions. We also test the convergence of our algorithm using two non-convex machine learning models.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-26DOI: 10.1007/s11222-024-10419-3
Abstract
The von Mises–Fisher distribution is a widely used probability model in directional statistics. An algorithm for generating pseudo-random vectors from this distribution was suggested by Wood (Commun Stat Simul Comput 23(1):157–164, 1994), which is based on a rejection sampling scheme. This paper proposes an alternative to this rejection sampling approach for drawing pseudo-random vectors from arbitrary von Mises–Fisher distributions. A useful mixture representation is derived, which is a mixture of beta distributions with mixing weights that follow a confluent hypergeometric distribution. A condensed table-lookup method is adopted for sampling from the confluent hypergeometric distribution. A theoretical analysis investigates the amount of computation required to construct the condensed lookup table. Through numerical experiments, we demonstrate that the proposed algorithm outperforms the rejection-based method when generating a large number of pseudo-random vectors from the same distribution.
摘要 von Mises-Fisher 分布是定向统计中广泛使用的概率模型。伍德(Commun Stat Simul Comput 23(1):157-164, 1994)提出了一种从该分布生成伪随机向量的算法,该算法基于拒绝抽样方案。本文提出了从任意 von Mises-Fisher 分布中抽取伪随机向量的拒绝抽样方法的替代方案。本文导出了一种有用的混合表示法,即混合权重遵循汇合超几何分布的贝塔分布的混合。从汇合超几何分布中采样时,采用了一种浓缩的查表方法。理论分析研究了构建浓缩查找表所需的计算量。通过数值实验,我们证明了当从同一分布生成大量伪随机向量时,所提出的算法优于基于拒绝的方法。
{"title":"Novel sampling method for the von Mises–Fisher distribution","authors":"","doi":"10.1007/s11222-024-10419-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10419-3","url":null,"abstract":"<h3>Abstract</h3> <p>The von Mises–Fisher distribution is a widely used probability model in directional statistics. An algorithm for generating pseudo-random vectors from this distribution was suggested by Wood (Commun Stat Simul Comput 23(1):157–164, 1994), which is based on a rejection sampling scheme. This paper proposes an alternative to this rejection sampling approach for drawing pseudo-random vectors from arbitrary von Mises–Fisher distributions. A useful mixture representation is derived, which is a mixture of beta distributions with mixing weights that follow a confluent hypergeometric distribution. A condensed table-lookup method is adopted for sampling from the confluent hypergeometric distribution. A theoretical analysis investigates the amount of computation required to construct the condensed lookup table. Through numerical experiments, we demonstrate that the proposed algorithm outperforms the rejection-based method when generating a large number of pseudo-random vectors from the same distribution.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140301529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-23DOI: 10.1007/s11222-024-10413-9
Sarah Leyder, Jakob Raymaekers, Tim Verdonck
Outliers contaminating data sets are a challenge to statistical estimators. Even a small fraction of outlying observations can heavily influence most classical statistical methods. In this paper we propose generalized spherical principal component analysis, a new robust version of principal component analysis that is based on the generalized spatial sign covariance matrix. Theoretical properties of the proposed method including influence functions, breakdown values and asymptotic efficiencies are derived. These theoretical results are complemented with an extensive simulation study and two real-data examples. We illustrate that generalized spherical principal component analysis can combine great robustness with solid efficiency properties, in addition to a low computational cost.
{"title":"Generalized spherical principal component analysis","authors":"Sarah Leyder, Jakob Raymaekers, Tim Verdonck","doi":"10.1007/s11222-024-10413-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10413-9","url":null,"abstract":"<p>Outliers contaminating data sets are a challenge to statistical estimators. Even a small fraction of outlying observations can heavily influence most classical statistical methods. In this paper we propose generalized spherical principal component analysis, a new robust version of principal component analysis that is based on the generalized spatial sign covariance matrix. Theoretical properties of the proposed method including influence functions, breakdown values and asymptotic efficiencies are derived. These theoretical results are complemented with an extensive simulation study and two real-data examples. We illustrate that generalized spherical principal component analysis can combine great robustness with solid efficiency properties, in addition to a low computational cost.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-23DOI: 10.1007/s11222-024-10417-5
Abstract
In high-dimensional data modeling, variable selection plays a crucial role in improving predictive accuracy and enhancing model interpretability through sparse representation. Unfortunately, certain variable selection methods encounter challenges such as insufficient model sparsity, high computational overhead, and difficulties in handling large-scale data. Recently, axis-aligned random projection techniques have been applied to address these issues by selecting variables. However, these techniques have seen limited application in handling complex data within the regression framework. In this study, we propose a novel method, sparse partial least squares via axis-aligned random projection, designed for the analysis of high-dimensional data. Initially, axis-aligned random projection is utilized to obtain a sparse loading vector, significantly reducing computational complexity. Subsequently, partial least squares regression is conducted within the subspace of the top-ranked significant variables. The submatrices are iteratively updated until an optimal sparse partial least squares model is achieved. Comparative analysis with some state-of-the-art high-dimensional regression methods demonstrates that the proposed method exhibits superior predictive performance. To illustrate its effectiveness, we apply the method to four cases, including one simulated dataset and three real-world datasets. The results show the proposed method’s ability to identify important variables in all four cases.
{"title":"Variable selection using axis-aligned random projections for partial least-squares regression","authors":"","doi":"10.1007/s11222-024-10417-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10417-5","url":null,"abstract":"<h3>Abstract</h3> <p>In high-dimensional data modeling, variable selection plays a crucial role in improving predictive accuracy and enhancing model interpretability through sparse representation. Unfortunately, certain variable selection methods encounter challenges such as insufficient model sparsity, high computational overhead, and difficulties in handling large-scale data. Recently, axis-aligned random projection techniques have been applied to address these issues by selecting variables. However, these techniques have seen limited application in handling complex data within the regression framework. In this study, we propose a novel method, sparse partial least squares via axis-aligned random projection, designed for the analysis of high-dimensional data. Initially, axis-aligned random projection is utilized to obtain a sparse loading vector, significantly reducing computational complexity. Subsequently, partial least squares regression is conducted within the subspace of the top-ranked significant variables. The submatrices are iteratively updated until an optimal sparse partial least squares model is achieved. Comparative analysis with some state-of-the-art high-dimensional regression methods demonstrates that the proposed method exhibits superior predictive performance. To illustrate its effectiveness, we apply the method to four cases, including one simulated dataset and three real-world datasets. The results show the proposed method’s ability to identify important variables in all four cases.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-23DOI: 10.1007/s11222-024-10403-x
Abstract
A substantial body of work in the last 15 years has shown that expectiles constitute an excellent candidate for becoming a standard tool in probabilistic and statistical modeling. Surprisingly, the question of how expectiles may be efficiently calculated has been left largely untouched. We fill this gap by, first, providing a general outlook on the computation of expectiles that relies on the knowledge of analytic expressions of the underlying distribution function and mean residual life function. We distinguish between discrete distributions, for which an exact calculation is always feasible, and continuous distributions, where a Newton–Raphson approximation algorithm can be implemented and a list of exceptional distributions whose expectiles are in analytic form can be given. When the distribution function and/or the mean residual life is difficult to compute, Monte-Carlo algorithms are introduced, based on an exact calculation of sample expectiles and on the use of control variates to improve computational efficiency. We discuss the relevance of our findings to statistical practice and provide numerical evidence of the performance of the considered methods.
{"title":"An expectile computation cookbook","authors":"","doi":"10.1007/s11222-024-10403-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10403-x","url":null,"abstract":"<h3>Abstract</h3> <p>A substantial body of work in the last 15 years has shown that expectiles constitute an excellent candidate for becoming a standard tool in probabilistic and statistical modeling. Surprisingly, the question of how expectiles may be efficiently calculated has been left largely untouched. We fill this gap by, first, providing a general outlook on the computation of expectiles that relies on the knowledge of analytic expressions of the underlying distribution function and mean residual life function. We distinguish between discrete distributions, for which an exact calculation is always feasible, and continuous distributions, where a Newton–Raphson approximation algorithm can be implemented and a list of exceptional distributions whose expectiles are in analytic form can be given. When the distribution function and/or the mean residual life is difficult to compute, Monte-Carlo algorithms are introduced, based on an exact calculation of sample expectiles and on the use of control variates to improve computational efficiency. We discuss the relevance of our findings to statistical practice and provide numerical evidence of the performance of the considered methods.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140205561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-22DOI: 10.1007/s11222-024-10418-4
Jungmin Shin, Seunghyun Gwak, Seung Jun Shin, Sungwan Bang
In this paper, we present the DNN-NMQR estimator, an approach that utilizes a deep neural network structure to solve multiple quantile regression problems. When estimating multiple quantiles, our approach leverages the structural characteristics of DNN to enhance estimation results by encouraging shared learning across different quantiles through DNN-NMQR. Also, this method effectively addresses quantile crossing issues through the penalization method. To refine our methodology, we introduce a convolution-type quadratic smoothing function, ensuring that the objective function remains differentiable throughout. Furthermore, we provide a brief discussion on the convergence analysis of DNN-NMQR, drawing on the concept of the neural tangent kernel. For a high-dimensional case, we propose the (A)GDNN-NMQR estimator, which applies group-wise (L_1)-type regularization methods and enjoys the advantages of quantile estimation and variable selection simultaneously. We extensively validate all of our proposed methods through numerical experiments and real data analysis.
{"title":"Simultaneous estimation and variable selection for a non-crossing multiple quantile regression using deep neural networks","authors":"Jungmin Shin, Seunghyun Gwak, Seung Jun Shin, Sungwan Bang","doi":"10.1007/s11222-024-10418-4","DOIUrl":"https://doi.org/10.1007/s11222-024-10418-4","url":null,"abstract":"<p>In this paper, we present the DNN-NMQR estimator, an approach that utilizes a deep neural network structure to solve multiple quantile regression problems. When estimating multiple quantiles, our approach leverages the structural characteristics of DNN to enhance estimation results by encouraging shared learning across different quantiles through DNN-NMQR. Also, this method effectively addresses quantile crossing issues through the penalization method. To refine our methodology, we introduce a convolution-type quadratic smoothing function, ensuring that the objective function remains differentiable throughout. Furthermore, we provide a brief discussion on the convergence analysis of DNN-NMQR, drawing on the concept of the neural tangent kernel. For a high-dimensional case, we propose the (A)GDNN-NMQR estimator, which applies group-wise <span>(L_1)</span>-type regularization methods and enjoys the advantages of quantile estimation and variable selection simultaneously. We extensively validate all of our proposed methods through numerical experiments and real data analysis.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}