首页 > 最新文献

Computational Statistics最新文献

英文 中文
Two-stage regression spline modeling based on local polynomial kernel regression 基于局部多项式核回归的两阶段回归样条线建模
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-05-01 DOI: 10.1007/s00180-024-01498-x
Hamid Mraoui, Ahmed El-Alaoui, Souad Bechrouri, Nezha Mohaoui, Abdelilah Monir

This paper introduces a new nonparametric estimator of the regression based on local quasi-interpolation spline method. This model combines a B-spline basis with a simple local polynomial regression, via blossoming approach, to produce a reduced rank spline like smoother. Different coefficients functionals are allowed to have different smoothing parameters (bandwidths) if the function has different smoothness. In addition, the number and location of the knots of this estimator are not fixed. In practice, we may employ a modest number of basis functions and then determine the smoothing parameter as the minimizer of the criterion. In simulations, the approach achieves very competitive performance with P-spline and smoothing spline methods. Simulated data and a real data example are used to illustrate the effectiveness of the method proposed in this paper.

本文介绍了一种基于局部准插值样条法的新的非参数估计回归模型。该模型通过绽放法将 B-样条曲线基础与简单的局部多项式回归相结合,生成类似于减阶样条曲线的平滑器。如果函数的平滑度不同,则允许不同的系数函数具有不同的平滑参数(带宽)。此外,该估计器的节点数量和位置也不是固定的。在实践中,我们可以采用数量适中的基函数,然后根据准则的最小化确定平滑参数。在模拟实验中,该方法与 P 样条法和平滑样条法相比,性能极具竞争力。本文使用模拟数据和真实数据实例来说明本文所提方法的有效性。
{"title":"Two-stage regression spline modeling based on local polynomial kernel regression","authors":"Hamid Mraoui, Ahmed El-Alaoui, Souad Bechrouri, Nezha Mohaoui, Abdelilah Monir","doi":"10.1007/s00180-024-01498-x","DOIUrl":"https://doi.org/10.1007/s00180-024-01498-x","url":null,"abstract":"<p>This paper introduces a new nonparametric estimator of the regression based on local quasi-interpolation spline method. This model combines a B-spline basis with a simple local polynomial regression, via blossoming approach, to produce a reduced rank spline like smoother. Different coefficients functionals are allowed to have different smoothing parameters (bandwidths) if the function has different smoothness. In addition, the number and location of the knots of this estimator are not fixed. In practice, we may employ a modest number of basis functions and then determine the smoothing parameter as the minimizer of the criterion. In simulations, the approach achieves very competitive performance with P-spline and smoothing spline methods. Simulated data and a real data example are used to illustrate the effectiveness of the method proposed in this paper.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"17 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140827212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancements in reliability estimation for the exponentiated Pareto distribution: a comparison of classical and Bayesian methods with lower record values 指数化帕累托分布可靠性估计的进展:使用较低记录值的经典方法与贝叶斯方法的比较
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-04-29 DOI: 10.1007/s00180-024-01497-y
Shubham Saini

Estimating the reliability of multicomponent systems is crucial in various engineering and reliability analysis applications. This paper investigates the multicomponent stress strength reliability estimation using lower record values, specifically for the exponentiated Pareto distribution. We compare classical estimation techniques, such as maximum likelihood estimation, with Bayesian estimation methods. Under Bayesian estimation, we employ Markov Chain Monte Carlo techniques and Tierney–Kadane’s approximation to obtain the posterior distribution of the reliability parameter. To evaluate the performance of the proposed estimation approaches, we conduct a comprehensive simulation study, considering various system configurations and sample sizes. Additionally, we analyze real data to illustrate the practical applicability of our methods. The proposed methodologies provide valuable insights for engineers and reliability analysts in accurately assessing the reliability of multicomponent systems using lower record values.

在各种工程和可靠性分析应用中,估算多组件系统的可靠性至关重要。本文研究了使用较低记录值估算多组件应力强度可靠性,特别是指数化帕累托分布。我们将最大似然估计等经典估计技术与贝叶斯估计方法进行了比较。在贝叶斯估计法中,我们采用马尔可夫链蒙特卡罗技术和 Tierney-Kadane 近似法来获得可靠性参数的后验分布。为了评估所提出的估计方法的性能,我们进行了全面的模拟研究,考虑了各种系统配置和样本大小。此外,我们还分析了真实数据,以说明我们方法的实际应用性。所提出的方法为工程师和可靠性分析人员提供了宝贵的见解,有助于他们使用较低的记录值准确评估多组件系统的可靠性。
{"title":"Advancements in reliability estimation for the exponentiated Pareto distribution: a comparison of classical and Bayesian methods with lower record values","authors":"Shubham Saini","doi":"10.1007/s00180-024-01497-y","DOIUrl":"https://doi.org/10.1007/s00180-024-01497-y","url":null,"abstract":"<p>Estimating the reliability of multicomponent systems is crucial in various engineering and reliability analysis applications. This paper investigates the multicomponent stress strength reliability estimation using lower record values, specifically for the exponentiated Pareto distribution. We compare classical estimation techniques, such as maximum likelihood estimation, with Bayesian estimation methods. Under Bayesian estimation, we employ Markov Chain Monte Carlo techniques and Tierney–Kadane’s approximation to obtain the posterior distribution of the reliability parameter. To evaluate the performance of the proposed estimation approaches, we conduct a comprehensive simulation study, considering various system configurations and sample sizes. Additionally, we analyze real data to illustrate the practical applicability of our methods. The proposed methodologies provide valuable insights for engineers and reliability analysts in accurately assessing the reliability of multicomponent systems using lower record values.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"153 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximizing adjusted covariance: new supervised dimension reduction for classification 调整后协方差最大化:用于分类的新监督降维方法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-04-02 DOI: 10.1007/s00180-024-01472-7
Hyejoon Park, Hyunjoong Kim, Yung-Seop Lee

This study proposes a new linear dimension reduction technique called Maximizing Adjusted Covariance (MAC), which is suitable for supervised classification. The new approach is to adjust the covariance matrix between input and target variables using the within-class sum of squares, thereby promoting class separation after linear dimension reduction. MAC has a low computational cost and can complement existing linear dimensionality reduction techniques for classification. In this study, the classification performance by MAC was compared with those of the existing linear dimension reduction methods using 44 datasets. In most of the classification models used in the experiment, the MAC dimension reduction method showed better classification accuracy and F1 score than other linear dimension reduction methods.

本研究提出了一种新的线性降维技术--最大化调整协方差(MAC),它适用于监督分类。新方法是利用类内平方和调整输入变量和目标变量之间的协方差矩阵,从而促进线性降维后的类分离。MAC 的计算成本较低,可作为现有线性降维分类技术的补充。本研究使用 44 个数据集比较了 MAC 与现有线性降维方法的分类性能。在实验中使用的大多数分类模型中,MAC 降维方法的分类准确率和 F1 分数都优于其他线性降维方法。
{"title":"Maximizing adjusted covariance: new supervised dimension reduction for classification","authors":"Hyejoon Park, Hyunjoong Kim, Yung-Seop Lee","doi":"10.1007/s00180-024-01472-7","DOIUrl":"https://doi.org/10.1007/s00180-024-01472-7","url":null,"abstract":"<p>This study proposes a new linear dimension reduction technique called Maximizing Adjusted Covariance (MAC), which is suitable for supervised classification. The new approach is to adjust the covariance matrix between input and target variables using the within-class sum of squares, thereby promoting class separation after linear dimension reduction. MAC has a low computational cost and can complement existing linear dimensionality reduction techniques for classification. In this study, the classification performance by MAC was compared with those of the existing linear dimension reduction methods using 44 datasets. In most of the classification models used in the experiment, the MAC dimension reduction method showed better classification accuracy and F1 score than other linear dimension reduction methods.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"53 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140567927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A class of transformed joint quantile time series models with applications to health studies 一类转化联合量化时间序列模型在健康研究中的应用
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-04-01 DOI: 10.1007/s00180-024-01484-3
Fahimeh Tourani-Farani, Zeynab Aghabazaz, Iraj Kazemi

Extensions of quantile regression modeling for time series analysis are extensively employed in medical and health studies. This study introduces a specific class of transformed quantile-dispersion regression models for non-stationary time series. These models possess the flexibility to incorporate the time-varying structure into the model specification, enabling precise predictions for future decisions. Our proposed modeling methodology applies to dynamic processes characterized by high variation and possible periodicity, relying on a non-linear framework. Additionally, unlike the transformed time series model, our approach directly interprets the regression parameters concerning the initial response. For computational purposes, we present an iteratively reweighted least squares algorithm. To assess the performance of our model, we conduct simulation experiments. To illustrate the modeling strategy, we analyze time-series measurements of influenza infection and daily COVID-19 deaths.

用于时间序列分析的量化回归模型的扩展在医学和健康研究中得到广泛应用。本研究为非平稳时间序列引入了一类特定的转换量化离散回归模型。这些模型具有灵活性,可将时变结构纳入模型规范,从而为未来决策提供精确预测。我们提出的建模方法适用于以高变化和可能的周期性为特征的动态过程,依赖于非线性框架。此外,与转换后的时间序列模型不同,我们的方法直接解释了有关初始响应的回归参数。为了便于计算,我们提出了一种迭代加权最小二乘法算法。为了评估模型的性能,我们进行了模拟实验。为了说明建模策略,我们分析了流感感染和 COVID-19 每日死亡人数的时间序列测量结果。
{"title":"A class of transformed joint quantile time series models with applications to health studies","authors":"Fahimeh Tourani-Farani, Zeynab Aghabazaz, Iraj Kazemi","doi":"10.1007/s00180-024-01484-3","DOIUrl":"https://doi.org/10.1007/s00180-024-01484-3","url":null,"abstract":"<p>Extensions of quantile regression modeling for time series analysis are extensively employed in medical and health studies. This study introduces a specific class of transformed quantile-dispersion regression models for non-stationary time series. These models possess the flexibility to incorporate the time-varying structure into the model specification, enabling precise predictions for future decisions. Our proposed modeling methodology applies to dynamic processes characterized by high variation and possible periodicity, relying on a non-linear framework. Additionally, unlike the transformed time series model, our approach directly interprets the regression parameters concerning the initial response. For computational purposes, we present an iteratively reweighted least squares algorithm. To assess the performance of our model, we conduct simulation experiments. To illustrate the modeling strategy, we analyze time-series measurements of influenza infection and daily COVID-19 deaths.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"96 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140567967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A smoothed semiparametric likelihood for estimation of nonparametric finite mixture models with a copula-based dependence structure 用于估计具有基于共轭依赖结构的非参数有限混合物模型的平滑半参数似然法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-27 DOI: 10.1007/s00180-024-01483-4
Michael Levine, Gildas Mazo

In this manuscript, we consider a finite multivariate nonparametric mixture model where the dependence between the marginal densities is modeled using the copula device. Pseudo expectation–maximization (EM) stochastic algorithms were recently proposed to estimate all of the components of this model under a location-scale constraint on the marginals. Here, we introduce a deterministic algorithm that seeks to maximize a smoothed semiparametric likelihood. No location-scale assumption is made about the marginals. The algorithm is monotonic in one special case, and, in another, leads to “approximate monotonicity”—whereby the difference between successive values of the objective function becomes non-negative up to an additive term that becomes negligible after a sufficiently large number of iterations. The behavior of this algorithm is illustrated on several simulated and real datasets. The results suggest that, under suitable conditions, the proposed algorithm may indeed be monotonic in general. A discussion of the results and some possible future research directions round out our presentation.

在本手稿中,我们考虑了一种有限多元非参数混合物模型,该模型中边际密度之间的依赖关系使用 copula 装置建模。最近提出的伪期望最大化(EM)随机算法可以在边际的位置尺度约束下估计该模型的所有成分。在这里,我们引入了一种确定性算法,旨在最大化平滑半参数似然。对边际值不做位置尺度假设。该算法在一种特殊情况下是单调的,而在另一种情况下则会导致 "近似单调性"--即目标函数的连续值之间的差值变为非负,直到一个加法项,该加法项在足够大的迭代次数后变得可以忽略不计。我们在几个模拟和真实数据集上对该算法的行为进行了说明。结果表明,在合适的条件下,所提出的算法在一般情况下可能确实是单调的。最后,我们将对结果和未来可能的研究方向进行讨论。
{"title":"A smoothed semiparametric likelihood for estimation of nonparametric finite mixture models with a copula-based dependence structure","authors":"Michael Levine, Gildas Mazo","doi":"10.1007/s00180-024-01483-4","DOIUrl":"https://doi.org/10.1007/s00180-024-01483-4","url":null,"abstract":"<p>In this manuscript, we consider a finite multivariate nonparametric mixture model where the dependence between the marginal densities is modeled using the copula device. Pseudo expectation–maximization (EM) stochastic algorithms were recently proposed to estimate all of the components of this model under a location-scale constraint on the marginals. Here, we introduce a deterministic algorithm that seeks to maximize a smoothed semiparametric likelihood. No location-scale assumption is made about the marginals. The algorithm is monotonic in one special case, and, in another, leads to “approximate monotonicity”—whereby the difference between successive values of the objective function becomes non-negative up to an additive term that becomes negligible after a sufficiently large number of iterations. The behavior of this algorithm is illustrated on several simulated and real datasets. The results suggest that, under suitable conditions, the proposed algorithm may indeed be monotonic in general. A discussion of the results and some possible future research directions round out our presentation.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"28 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A subspace aggregating algorithm for accurate classification 用于精确分类的子空间聚合算法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-09 DOI: 10.1007/s00180-024-01476-3
Saeid Amiri, Reza Modarres

We present a technique for learning via aggregation in supervised classification. The new method improves classification performance, regardless of which classifier is at its core. This approach exploits the information hidden in subspaces by combinations of aggregating variables and is applicable to high-dimensional data sets. We provide algorithms that randomly divide the variables into smaller subsets and permute them before applying a classification method to each subset. We combine the resulting classes to predict the class membership. Theoretical and simulation analyses consistently demonstrate the high accuracy of our classification methods. In comparison to aggregating observations through sampling, our approach proves to be significantly more effective. Through extensive simulations, we evaluate the accuracy of various classification methods. To further illustrate the effectiveness of our techniques, we apply them to five real-world data sets.

我们提出了一种在监督分类中通过聚合进行学习的技术。无论哪种分类器是其核心,新方法都能提高分类性能。这种方法利用了聚合变量组合隐藏在子空间中的信息,适用于高维数据集。我们提供的算法可将变量随机划分为较小的子集,并在对每个子集应用分类方法之前对其进行排列。我们将得到的类别结合起来,以预测类别成员资格。理论和模拟分析一致证明了我们的分类方法具有很高的准确性。与通过抽样来汇总观察结果相比,我们的方法被证明更为有效。通过大量模拟,我们评估了各种分类方法的准确性。为了进一步说明我们技术的有效性,我们将其应用于五个真实世界的数据集。
{"title":"A subspace aggregating algorithm for accurate classification","authors":"Saeid Amiri, Reza Modarres","doi":"10.1007/s00180-024-01476-3","DOIUrl":"https://doi.org/10.1007/s00180-024-01476-3","url":null,"abstract":"<p>We present a technique for learning via aggregation in supervised classification. The new method improves classification performance, regardless of which classifier is at its core. This approach exploits the information hidden in subspaces by combinations of aggregating variables and is applicable to high-dimensional data sets. We provide algorithms that randomly divide the variables into smaller subsets and permute them before applying a classification method to each subset. We combine the resulting classes to predict the class membership. Theoretical and simulation analyses consistently demonstrate the high accuracy of our classification methods. In comparison to aggregating observations through sampling, our approach proves to be significantly more effective. Through extensive simulations, we evaluate the accuracy of various classification methods. To further illustrate the effectiveness of our techniques, we apply them to five real-world data sets.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"12 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140075811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imbalanced data sampling design based on grid boundary domain for big data 基于网格边界域的大数据不平衡数据采样设计
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-08 DOI: 10.1007/s00180-024-01471-8

Abstract

The data distribution is often associated with a priori-known probability, and the occurrence probability of interest events is small, so a large amount of imbalanced data appears in sociology, economics, engineering, and various other fields. The existing over- and under-sampling methods are widely used in imbalanced data classification problems, but over-sampling leads to overfitting, and under-sampling ignores the effective information. We propose a new sampling design algorithm called the neighbor grid of boundary mixed-sampling (NGBM), which focuses on the boundary information. This paper obtains the classification boundary information through grid boundary domain identification, thereby determining the importance of the samples. Based on this premise, the synthetic minority oversampling technique is applied to the boundary grid, and random under-sampling is applied to the other grids. With the help of this mixed sampling strategy, more important classification boundary information, especially for positive sample information identification is extracted. Numerical simulations and real data analysis are used to discuss the parameter-setting strategy of the NGBM and illustrate the advantages of the proposed NGBM in the imbalanced data, as well as practical applications.

摘要 数据分布往往与事先已知的概率有关,而感兴趣事件的发生概率较小,因此在社会学、经济学、工程学等各个领域都会出现大量的不平衡数据。现有的过采样和欠采样方法被广泛应用于不平衡数据分类问题,但过采样会导致过拟合,而欠采样会忽略有效信息。我们提出了一种新的采样设计算法,称为边界混合采样的邻域网格(NGBM),它关注边界信息。本文通过网格边界域识别获得分类边界信息,从而确定样本的重要性。在此前提下,对边界网格采用合成少数超采样技术,对其他网格采用随机欠采样技术。在这种混合采样策略的帮助下,可以提取出更重要的分类边界信息,尤其是对正样本信息的识别。通过数值模拟和实际数据分析,讨论了 NGBM 的参数设置策略,并说明了所提出的 NGBM 在不平衡数据中的优势以及实际应用。
{"title":"Imbalanced data sampling design based on grid boundary domain for big data","authors":"","doi":"10.1007/s00180-024-01471-8","DOIUrl":"https://doi.org/10.1007/s00180-024-01471-8","url":null,"abstract":"<h3>Abstract</h3> <p>The data distribution is often associated with a <em>priori</em>-known probability, and the occurrence probability of interest events is small, so a large amount of imbalanced data appears in sociology, economics, engineering, and various other fields. The existing over- and under-sampling methods are widely used in imbalanced data classification problems, but over-sampling leads to overfitting, and under-sampling ignores the effective information. We propose a new sampling design algorithm called the neighbor grid of boundary mixed-sampling (NGBM), which focuses on the boundary information. This paper obtains the classification boundary information through grid boundary domain identification, thereby determining the importance of the samples. Based on this premise, the synthetic minority oversampling technique is applied to the boundary grid, and random under-sampling is applied to the other grids. With the help of this mixed sampling strategy, more important classification boundary information, especially for positive sample information identification is extracted. Numerical simulations and real data analysis are used to discuss the parameter-setting strategy of the NGBM and illustrate the advantages of the proposed NGBM in the imbalanced data, as well as practical applications.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"54 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140075873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse estimation of linear model via Bayesian method $$^*$$ 通过贝叶斯方法对线性模型进行稀疏估计 $$^*$$
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-04 DOI: 10.1007/s00180-024-01474-5

Abstract

This paper considers the sparse estimation problem of regression coefficients in the linear model. Note that the global–local shrinkage priors do not allow the regression coefficients to be truly estimated as zero, we propose three threshold rules and compare their contraction properties, and also tandem those rules with the popular horseshoe prior and the horseshoe+ prior that are normally regarded as global–local shrinkage priors. The hierarchical prior expressions for the horseshoe prior and the horseshoe+ prior are obtained, and the full conditional posterior distributions for all parameters for algorithm implementation are also given. Simulation studies indicate that the horseshoe/horseshoe+ prior with the threshold rules are both superior to the spike-slab models. Finally, a real data analysis demonstrates the effectiveness of variable selection of the proposed method.

摘要 本文考虑了线性模型中回归系数的稀疏估计问题。我们提出了三种阈值规则,并比较了它们的收缩特性,还将这些规则与通常被视为全局局部收缩先验的流行的马蹄先验和马蹄+先验进行了串联。我们得到了马蹄先验和马蹄+先验的层次先验表达式,并给出了用于算法实现的所有参数的全条件后验分布。模拟研究表明,带有阈值规则的马蹄先验/马蹄+先验都优于尖峰板模型。最后,实际数据分析证明了所提方法在变量选择方面的有效性。
{"title":"Sparse estimation of linear model via Bayesian method $$^*$$","authors":"","doi":"10.1007/s00180-024-01474-5","DOIUrl":"https://doi.org/10.1007/s00180-024-01474-5","url":null,"abstract":"<h3>Abstract</h3> <p>This paper considers the sparse estimation problem of regression coefficients in the linear model. Note that the global–local shrinkage priors do not allow the regression coefficients to be truly estimated as zero, we propose three threshold rules and compare their contraction properties, and also tandem those rules with the popular horseshoe prior and the horseshoe+ prior that are normally regarded as global–local shrinkage priors. The hierarchical prior expressions for the horseshoe prior and the horseshoe+ prior are obtained, and the full conditional posterior distributions for all parameters for algorithm implementation are also given. Simulation studies indicate that the horseshoe/horseshoe+ prior with the threshold rules are both superior to the spike-slab models. Finally, a real data analysis demonstrates the effectiveness of variable selection of the proposed method.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"35 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140036222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Degree selection methods for curve estimation via Bernstein polynomials 通过伯恩斯坦多项式进行曲线估算的度数选择方法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-02 DOI: 10.1007/s00180-024-01473-6

Abstract

Bernstein Polynomial (BP) bases can uniformly approximate any continuous function based on observed noisy samples. However, a persistent challenge is the data-driven selection of a suitable degree for the BPs. In the absence of noise, asymptotic theory suggests that a larger degree leads to better approximation. However, in the presence of noise, which reduces bias, a larger degree also results in larger variances due to high-dimensional parameter estimation. Thus, a balance in the classic bias-variance trade-off is essential. The main objective of this work is to determine the minimum possible degree of the approximating BPs using probabilistic methods that are robust to various shapes of an unknown continuous function. Beyond offering theoretical guidance, the paper includes numerical illustrations to address the issue of determining a suitable degree for BPs in approximating arbitrary continuous functions.

摘要 伯恩斯坦多项式(BP)基可以根据观测到的噪声样本均匀地近似任何连续函数。然而,一个长期存在的难题是如何根据数据为 BP 选择合适的阶数。在没有噪声的情况下,渐近理论表明,阶数越大,逼近效果越好。然而,在有噪声的情况下,噪声会减少偏差,但由于高维参数估计,较大的度数也会导致较大的方差。因此,传统的偏差-方差权衡中的平衡至关重要。这项工作的主要目的是利用概率方法确定近似 BP 的最小可能度,这种方法对未知连续函数的各种形状都具有鲁棒性。除了提供理论指导外,本文还通过数值说明来解决在逼近任意连续函数时如何确定 BP 的合适度这一问题。
{"title":"Degree selection methods for curve estimation via Bernstein polynomials","authors":"","doi":"10.1007/s00180-024-01473-6","DOIUrl":"https://doi.org/10.1007/s00180-024-01473-6","url":null,"abstract":"<h3>Abstract</h3> <p>Bernstein Polynomial (BP) bases can uniformly approximate any continuous function based on observed noisy samples. However, a persistent challenge is the data-driven selection of a suitable degree for the BPs. In the absence of noise, asymptotic theory suggests that a larger degree leads to better approximation. However, in the presence of noise, which reduces bias, a larger degree also results in larger variances due to high-dimensional parameter estimation. Thus, a balance in the classic bias-variance trade-off is essential. The main objective of this work is to determine the minimum possible degree of the approximating BPs using probabilistic methods that are robust to various shapes of an unknown continuous function. Beyond offering theoretical guidance, the paper includes numerical illustrations to address the issue of determining a suitable degree for BPs in approximating arbitrary continuous functions.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"22 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140016810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic piecewise linear regression 自动片断线性回归
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-01 DOI: 10.1007/s00180-024-01475-4
Mathias von Ottenbreit, Riccardo De Bin

Regression modelling often presents a trade-off between predictiveness and interpretability. Highly predictive and popular tree-based algorithms such as Random Forest and boosted trees predict very well the outcome of new observations, but the effect of the predictors on the result is hard to interpret. Highly interpretable algorithms like linear effect-based boosting and MARS, on the other hand, are typically less predictive. Here we propose a novel regression algorithm, automatic piecewise linear regression (APLR), that combines the predictiveness of a boosting algorithm with the interpretability of a MARS model. In addition, as a boosting algorithm, it automatically handles variable selection, and, as a MARS-based approach, it takes into account non-linear relationships and possible interaction terms. We show on simulated and real data examples how APLR’s performance is comparable to that of the top-performing approaches in terms of prediction, while offering an easy way to interpret the results. APLR has been implemented in C++ and wrapped in a Python package as a Scikit-learn compatible estimator.

回归建模通常需要在预测性和可解释性之间做出权衡。高预测性和流行的基于树的算法(如随机森林和提升树)能很好地预测新观测结果,但预测因子对结果的影响却难以解释。另一方面,基于线性效应的提升算法和 MARS 等可解释性强的算法通常预测性较差。在这里,我们提出了一种新型回归算法--自动分片线性回归(APLR),它结合了提升算法的预测性和 MARS 模型的可解释性。此外,作为一种提升算法,它能自动处理变量选择;作为一种基于 MARS 的方法,它能考虑到非线性关系和可能的交互项。我们在模拟和真实数据示例中展示了 APLR 在预测方面的性能如何与表现最佳的方法相媲美,同时还提供了解释结果的简便方法。APLR 是用 C++ 实现的,并封装在一个 Python 软件包中,作为 Scikit-learn 兼容的估计器。
{"title":"Automatic piecewise linear regression","authors":"Mathias von Ottenbreit, Riccardo De Bin","doi":"10.1007/s00180-024-01475-4","DOIUrl":"https://doi.org/10.1007/s00180-024-01475-4","url":null,"abstract":"<p>Regression modelling often presents a trade-off between predictiveness and interpretability. Highly predictive and popular tree-based algorithms such as Random Forest and boosted trees predict very well the outcome of new observations, but the effect of the predictors on the result is hard to interpret. Highly interpretable algorithms like linear effect-based boosting and MARS, on the other hand, are typically less predictive. Here we propose a novel regression algorithm, automatic piecewise linear regression (APLR), that combines the predictiveness of a boosting algorithm with the interpretability of a MARS model. In addition, as a boosting algorithm, it automatically handles variable selection, and, as a MARS-based approach, it takes into account non-linear relationships and possible interaction terms. We show on simulated and real data examples how APLR’s performance is comparable to that of the top-performing approaches in terms of prediction, while offering an easy way to interpret the results. APLR has been implemented in C++ and wrapped in a Python package as a Scikit-learn compatible estimator.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140016808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1