首页 > 最新文献

Statistics and Computing最新文献

英文 中文
Efficient estimation and correction of selection-induced bias with order statistics 利用阶次统计有效估计和修正选择诱导偏差
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-12 DOI: 10.1007/s11222-024-10442-4
Yann McLatchie, Aki Vehtari

Model selection aims to identify a sufficiently well performing model that is possibly simpler than the most complex model among a pool of candidates. However, the decision-making process itself can inadvertently introduce non-negligible bias when the cross-validation estimates of predictive performance are marred by excessive noise. In finite data regimes, cross-validated estimates can encourage the statistician to select one model over another when it is not actually better for future data. While this bias remains negligible in the case of few models, when the pool of candidates grows, and model selection decisions are compounded (as in step-wise selection), the expected magnitude of selection-induced bias is likely to grow too. This paper introduces an efficient approach to estimate and correct selection-induced bias based on order statistics. Numerical experiments demonstrate the reliability of our approach in estimating both selection-induced bias and over-fitting along compounded model selection decisions, with specific application to forward search. This work represents a light-weight alternative to more computationally expensive approaches to correcting selection-induced bias, such as nested cross-validation and the bootstrap. Our approach rests on several theoretic assumptions, and we provide a diagnostic to help understand when these may not be valid and when to fall back on safer, albeit more computationally expensive approaches. The accompanying code facilitates its practical implementation and fosters further exploration in this area.

模型选择的目的是找出一个性能足够好的模型,它可能比候选模型库中最复杂的模型更简单。然而,当预测性能的交叉验证估计值受到过多噪声的影响时,决策过程本身可能会无意中引入不可忽略的偏差。在有限数据环境中,交叉验证估计值会鼓励统计学家选择一个模型而不是另一个模型,而实际上这个模型对未来数据并没有更好的预测效果。虽然在模型数量较少的情况下,这种偏差仍然可以忽略不计,但当候选模型库不断扩大,模型选择决策不断复合(如逐步选择)时,由选择引起的偏差的预期幅度也可能随之增大。本文介绍了一种基于阶次统计估计和纠正选择诱导偏差的有效方法。数值实验证明了我们的方法在估计复合模型选择决策的选择诱导偏差和过拟合方面的可靠性,并具体应用于前向搜索。这项工作是一种轻量级方法,可替代计算成本较高的方法来纠正选择诱导偏差,如嵌套交叉验证和自举法。我们的方法建立在几个理论假设的基础上,我们提供了一种诊断方法,以帮助理解这些假设何时可能无效,以及何时应退而求其次采用更安全、但计算成本更高的方法。随附的代码有助于该方法的实际应用,并促进该领域的进一步探索。
{"title":"Efficient estimation and correction of selection-induced bias with order statistics","authors":"Yann McLatchie, Aki Vehtari","doi":"10.1007/s11222-024-10442-4","DOIUrl":"https://doi.org/10.1007/s11222-024-10442-4","url":null,"abstract":"<p>Model selection aims to identify a sufficiently well performing model that is possibly simpler than the most complex model among a pool of candidates. However, the decision-making process itself can inadvertently introduce non-negligible bias when the cross-validation estimates of predictive performance are marred by excessive noise. In finite data regimes, cross-validated estimates can encourage the statistician to select one model over another when it is not actually better for future data. While this bias remains negligible in the case of few models, when the pool of candidates grows, and model selection decisions are compounded (as in step-wise selection), the expected magnitude of selection-induced bias is likely to grow too. This paper introduces an efficient approach to estimate and correct selection-induced bias based on order statistics. Numerical experiments demonstrate the reliability of our approach in estimating both selection-induced bias and over-fitting along compounded model selection decisions, with specific application to forward search. This work represents a light-weight alternative to more computationally expensive approaches to correcting selection-induced bias, such as nested cross-validation and the bootstrap. Our approach rests on several theoretic assumptions, and we provide a diagnostic to help understand when these may not be valid and when to fall back on safer, albeit more computationally expensive approaches. The accompanying code facilitates its practical implementation and fosters further exploration in this area.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bias-reduced and variance-corrected asymptotic Gaussian inference about extreme expectiles 关于极值期望值的偏差减少和方差校正渐近高斯推理
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-07 DOI: 10.1007/s11222-023-10359-4
A. Daouia, Gilles Stupfler, Antoine Usseglio‐Carleve
{"title":"Bias-reduced and variance-corrected asymptotic Gaussian inference about extreme expectiles","authors":"A. Daouia, Gilles Stupfler, Antoine Usseglio‐Carleve","doi":"10.1007/s11222-023-10359-4","DOIUrl":"https://doi.org/10.1007/s11222-023-10359-4","url":null,"abstract":"","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141370510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Jittering and clustering: strategies for the construction of robust designs 抖动和聚类:构建稳健设计的策略
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-06-04 DOI: 10.1007/s11222-024-10436-2
Douglas P. Wiens

We discuss, and give examples of, methods for randomly implementing some minimax robust designs from the literature. These have the advantage, over their deterministic counterparts, of having bounded maximum loss in large and very rich neighbourhoods of the, almost certainly inexact, response model fitted by the experimenter. Their maximum loss rivals that of the theoretically best possible, but not implementable, minimax designs. The procedures are then extended to more general robust designs. For two-dimensional designs we sample from contractions of Voronoi tessellations, generated by selected basis points, which partition the design space. These ideas are then extended to k-dimensional designs for general k.

我们将讨论并举例说明随机实施文献中某些最小稳健设计的方法。与确定性设计相比,这些设计的优点是在实验者拟合的反应模型(几乎可以肯定是不精确的)的大而丰富的邻域内具有有界最大损失。它们的最大损失可与理论上最佳但无法实施的最小设计相媲美。然后,我们将程序扩展到更一般的稳健设计。对于二维设计,我们从由选定基点生成的 Voronoi 网格收缩中进行采样,从而分割设计空间。然后,我们将这些想法扩展到一般 k 的 k 维设计。
{"title":"Jittering and clustering: strategies for the construction of robust designs","authors":"Douglas P. Wiens","doi":"10.1007/s11222-024-10436-2","DOIUrl":"https://doi.org/10.1007/s11222-024-10436-2","url":null,"abstract":"<p>We discuss, and give examples of, methods for randomly implementing some minimax robust designs from the literature. These have the advantage, over their deterministic counterparts, of having bounded maximum loss in large and very rich neighbourhoods of the, almost certainly inexact, response model fitted by the experimenter. Their maximum loss rivals that of the theoretically best possible, but not implementable, minimax designs. The procedures are then extended to more general robust designs. For two-dimensional designs we sample from contractions of Voronoi tessellations, generated by selected basis points, which partition the design space. These ideas are then extended to <i>k</i>-dimensional designs for general <i>k</i>.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141259028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing the goodness-of-fit of the stable distributions with applications to German stock index data and Bitcoin cryptocurrency data 应用德国股票指数数据和比特币加密货币数据检验稳定分布的拟合优度
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-06-03 DOI: 10.1007/s11222-024-10441-5
Ruhul Ali Khan, Ayan Pal, Debasis Kundu

Outlier-prone data sets are of immense interest in diverse areas including economics, finance, statistical physics, signal processing, telecommunications and so on. Stable laws (also known as (alpha )- stable laws) are often found to be useful in modeling outlier-prone data containing important information and exhibiting heavy tailed phenomenon. In this article, an asymptotic distribution of a unbiased and consistent estimator of the stability index (alpha ) is proposed based on jackknife empirical likelihood (JEL) and adjusted JEL method. Next, using the sum-preserving property of stable random variables and exploiting U-statistic theory, we have developed a goodness-of-fit test procedure for (alpha )-stable distributions where the stability index (alpha ) is specified. Extensive simulation studies are performed in order to assess the finite sample performance of the proposed test. Finally, two appealing real life data examples related to the daily closing price of German Stock Index and Bitcoin cryptocurrency are analysed in detail for illustration purposes.

离群值数据集在经济、金融、统计物理、信号处理、电信等多个领域都有着巨大的意义。稳定规律(也称为 (α)- 稳定规律)经常被用来模拟包含重要信息并表现出重尾现象的离群易变数据。本文基于杰克刀经验似然法(JEL)和调整JEL法,提出了稳定指数(alpha )的无偏一致估计值的渐近分布。接下来,我们利用稳定随机变量的保和性并利用 U 统计理论,为指定了稳定指数 ()的 (α )-稳定分布建立了拟合优度检验程序。为了评估所提出的测试的有限样本性能,进行了广泛的模拟研究。最后,为了说明问题,详细分析了与德国股票指数和比特币加密货币每日收盘价相关的两个有吸引力的现实生活数据示例。
{"title":"Testing the goodness-of-fit of the stable distributions with applications to German stock index data and Bitcoin cryptocurrency data","authors":"Ruhul Ali Khan, Ayan Pal, Debasis Kundu","doi":"10.1007/s11222-024-10441-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10441-5","url":null,"abstract":"<p>Outlier-prone data sets are of immense interest in diverse areas including economics, finance, statistical physics, signal processing, telecommunications and so on. Stable laws (also known as <span>(alpha )</span>- stable laws) are often found to be useful in modeling outlier-prone data containing important information and exhibiting heavy tailed phenomenon. In this article, an asymptotic distribution of a unbiased and consistent estimator of the stability index <span>(alpha )</span> is proposed based on jackknife empirical likelihood (JEL) and adjusted JEL method. Next, using the sum-preserving property of stable random variables and exploiting <i>U</i>-statistic theory, we have developed a goodness-of-fit test procedure for <span>(alpha )</span>-stable distributions where the stability index <span>(alpha )</span> is specified. Extensive simulation studies are performed in order to assess the finite sample performance of the proposed test. Finally, two appealing real life data examples related to the daily closing price of German Stock Index and Bitcoin cryptocurrency are analysed in detail for illustration purposes.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141259103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Insufficient Gibbs sampling 吉布斯采样不足
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-05-31 DOI: 10.1007/s11222-024-10423-7
Antoine Luciano, Christian P. Robert, Robin J. Ryder

In some applied scenarios, the availability of complete data is restricted, often due to privacy concerns; only aggregated, robust and inefficient statistics derived from the data are made accessible. These robust statistics are not sufficient, but they demonstrate reduced sensitivity to outliers and offer enhanced data protection due to their higher breakdown point. We consider a parametric framework and propose a method to sample from the posterior distribution of parameters conditioned on various robust and inefficient statistics: specifically, the pairs (median, MAD) or (median, IQR), or a collection of quantiles. Our approach leverages a Gibbs sampler and simulates latent augmented data, which facilitates simulation from the posterior distribution of parameters belonging to specific families of distributions. A by-product of these samples from the joint posterior distribution of parameters and data given the observed statistics is that we can estimate Bayes factors based on observed statistics via bridge sampling. We validate and outline the limitations of the proposed methods through toy examples and an application to real-world income data.

在某些应用场景中,往往出于隐私考虑,完整数据的可用性受到限制;只能获取从数据中得出的汇总、稳健和低效统计数据。这些稳健的统计数据并不充分,但它们对异常值的敏感度较低,而且由于其击穿点较高,可提供更强的数据保护。我们考虑了一个参数框架,并提出了一种从参数的后验分布中进行采样的方法,其条件是各种稳健和低效统计量:具体来说,就是成对的(中位数、MAD)或(中位数、IQR),或一组量值。我们的方法利用吉布斯采样器并模拟潜在的增强数据,这有助于从属于特定分布系列的参数后验分布中进行模拟。从参数和数据的联合后验分布(给定观测统计量)中采样的一个副产品是,我们可以通过桥采样根据观测统计量估算贝叶斯系数。我们通过玩具示例和对现实世界收入数据的应用,验证并概述了所提方法的局限性。
{"title":"Insufficient Gibbs sampling","authors":"Antoine Luciano, Christian P. Robert, Robin J. Ryder","doi":"10.1007/s11222-024-10423-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10423-7","url":null,"abstract":"<p>In some applied scenarios, the availability of complete data is restricted, often due to privacy concerns; only aggregated, robust and inefficient statistics derived from the data are made accessible. These robust statistics are not sufficient, but they demonstrate reduced sensitivity to outliers and offer enhanced data protection due to their higher breakdown point. We consider a parametric framework and propose a method to sample from the posterior distribution of parameters conditioned on various robust and inefficient statistics: specifically, the pairs (median, MAD) or (median, IQR), or a collection of quantiles. Our approach leverages a Gibbs sampler and simulates latent augmented data, which facilitates simulation from the posterior distribution of parameters belonging to specific families of distributions. A by-product of these samples from the joint posterior distribution of parameters and data given the observed statistics is that we can estimate Bayes factors based on observed statistics via bridge sampling. We validate and outline the limitations of the proposed methods through toy examples and an application to real-world income data.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141190162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimization of the generalized covariance estimator in noncausal processes 非因果过程中广义协方差估计器的优化
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-05-31 DOI: 10.1007/s11222-024-10437-1
Gianluca Cubadda, Francesco Giancaterini, Alain Hecq, Joann Jasiak

This paper investigates the performance of routinely used optimization algorithms in application to the Generalized Covariance estimator (GCov) for univariate and multivariate mixed causal and noncausal models. The GCov is a semi-parametric estimator with an objective function based on nonlinear autocovariances to identify causal and noncausal orders. When the number and type of nonlinear autocovariances included in the objective function are insufficient/inadequate, or the error density is too close to the Gaussian, identification issues can arise. These issues result in local minima in the objective function, which correspond to parameter values associated with incorrect causal and noncausal orders. Then, depending on the starting point and the optimization algorithm employed, the algorithm can converge to a local minimum. The paper proposes the Simulated Annealing (SA) optimization algorithm as an alternative to conventional numerical optimization methods. The results demonstrate that SA performs well in its application to mixed causal and noncausal models, successfully eliminating the effects of local minima. The proposed approach is illustrated by an empirical study of a bivariate series of commodity prices.

本文研究了常规优化算法在单变量和多变量混合因果和非因果模型的广义协方差估计器(GCov)应用中的性能。GCov 是一种半参数估计器,其目标函数基于非线性自变量,用于识别因果和非因果阶次。当目标函数中包含的非线性自变量的数量和类型不足/不充分,或误差密度过于接近高斯时,就会出现识别问题。这些问题会导致目标函数出现局部极小值,而局部极小值与不正确的因果和非因果阶次相关的参数值相对应。然后,根据起点和所采用的优化算法,算法会收敛到局部最小值。本文提出了模拟退火(SA)优化算法,以替代传统的数值优化方法。结果表明,SA 在应用于混合因果和非因果模型时表现良好,成功消除了局部最小值的影响。通过对商品价格二元序列的实证研究,对所提出的方法进行了说明。
{"title":"Optimization of the generalized covariance estimator in noncausal processes","authors":"Gianluca Cubadda, Francesco Giancaterini, Alain Hecq, Joann Jasiak","doi":"10.1007/s11222-024-10437-1","DOIUrl":"https://doi.org/10.1007/s11222-024-10437-1","url":null,"abstract":"<p>This paper investigates the performance of routinely used optimization algorithms in application to the Generalized Covariance estimator (<i>GCov</i>) for univariate and multivariate mixed causal and noncausal models. The <i>GCov</i> is a semi-parametric estimator with an objective function based on nonlinear autocovariances to identify causal and noncausal orders. When the number and type of nonlinear autocovariances included in the objective function are insufficient/inadequate, or the error density is too close to the Gaussian, identification issues can arise. These issues result in local minima in the objective function, which correspond to parameter values associated with incorrect causal and noncausal orders. Then, depending on the starting point and the optimization algorithm employed, the algorithm can converge to a local minimum. The paper proposes the Simulated Annealing (SA) optimization algorithm as an alternative to conventional numerical optimization methods. The results demonstrate that SA performs well in its application to mixed causal and noncausal models, successfully eliminating the effects of local minima. The proposed approach is illustrated by an empirical study of a bivariate series of commodity prices.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141190197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A modified EM-type algorithm to estimate semi-parametric mixtures of non-parametric regressions 估计非参数回归半参数混合物的改进型 EM 算法
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-05-29 DOI: 10.1007/s11222-024-10435-3
Sphiwe B. Skhosana, Salomon M. Millard, Frans H. J. Kanfer

Semi-parametric Gaussian mixtures of non-parametric regressions (SPGMNRs) are a flexible extension of Gaussian mixtures of linear regressions (GMLRs). The model assumes that the component regression functions (CRFs) are non-parametric functions of the covariate(s) whereas the component mixing proportions and variances are constants. Unfortunately, the model cannot be reliably estimated using traditional methods. A local-likelihood approach for estimating the CRFs requires that we maximize a set of local-likelihood functions. Using the Expectation-Maximization (EM) algorithm to separately maximize each local-likelihood function may lead to label-switching. This is because the posterior probabilities calculated at the local E-step are not guaranteed to be aligned. The consequence of this label-switching is wiggly and non-smooth estimates of the CRFs. In this paper, we propose a unified approach to address label-switching and obtain sensible estimates. The proposed approach has two stages. In the first stage, we propose a model-based approach to address the label-switching problem. We first note that each local-likelihood function is a likelihood function of a Gaussian mixture model (GMM). Next, we reformulate the SPGMNRs model as a mixture of these GMMs. Lastly, using a modified version of the Expectation Conditional Maximization (ECM) algorithm, we estimate the mixture of GMMs. In addition, using the mixing weights of the local GMMs, we can automatically choose the local points where local-likelihood estimation takes place. In the second stage, we propose one-step backfitting estimates of the parametric and non-parametric terms. The effectiveness of the proposed approach is demonstrated on simulated data and real data analysis.

非参数回归半参数高斯混合物(SPGMNRs)是线性回归高斯混合物(GMLRs)的灵活扩展。该模型假设成分回归函数(CRF)是协变量的非参数函数,而成分混合比例和方差是常数。遗憾的是,使用传统方法无法可靠地估计该模型。估计 CRF 的局部似然法要求我们最大化一组局部似然函数。使用期望最大化(EM)算法分别最大化每个局部似然函数可能会导致标签切换。这是因为在局部 E 步计算出的后验概率不能保证一致。这种标签切换的后果是 CRF 的估计值摇摆不定且不平滑。在本文中,我们提出了一种统一的方法来解决标签切换问题,并获得合理的估计值。我们提出的方法分为两个阶段。在第一阶段,我们提出了一种基于模型的方法来解决标签切换问题。我们首先指出,每个局部似然函数都是高斯混合模型(GMM)的似然函数。接下来,我们将 SPGMNRs 模型重新表述为这些 GMM 的混合物。最后,我们使用改进版的期望条件最大化(ECM)算法来估计 GMM 混合物。此外,利用局部 GMM 的混合权重,我们可以自动选择进行局部似然估计的局部点。在第二阶段,我们提出了参数和非参数项的一步反拟合估计。我们通过模拟数据和实际数据分析证明了所提方法的有效性。
{"title":"A modified EM-type algorithm to estimate semi-parametric mixtures of non-parametric regressions","authors":"Sphiwe B. Skhosana, Salomon M. Millard, Frans H. J. Kanfer","doi":"10.1007/s11222-024-10435-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10435-3","url":null,"abstract":"<p>Semi-parametric Gaussian mixtures of non-parametric regressions (SPGMNRs) are a flexible extension of Gaussian mixtures of linear regressions (GMLRs). The model assumes that the component regression functions (CRFs) are non-parametric functions of the covariate(s) whereas the component mixing proportions and variances are constants. Unfortunately, the model cannot be reliably estimated using traditional methods. A local-likelihood approach for estimating the CRFs requires that we maximize a set of local-likelihood functions. Using the Expectation-Maximization (EM) algorithm to separately maximize each local-likelihood function may lead to label-switching. This is because the posterior probabilities calculated at the local E-step are not guaranteed to be aligned. The consequence of this label-switching is wiggly and non-smooth estimates of the CRFs. In this paper, we propose a unified approach to address label-switching and obtain sensible estimates. The proposed approach has two stages. In the first stage, we propose a model-based approach to address the label-switching problem. We first note that each local-likelihood function is a likelihood function of a Gaussian mixture model (GMM). Next, we reformulate the SPGMNRs model as a mixture of these GMMs. Lastly, using a modified version of the Expectation Conditional Maximization (ECM) algorithm, we estimate the mixture of GMMs. In addition, using the mixing weights of the local GMMs, we can automatically choose the local points where local-likelihood estimation takes place. In the second stage, we propose one-step backfitting estimates of the parametric and non-parametric terms. The effectiveness of the proposed approach is demonstrated on simulated data and real data analysis.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141166408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized fused Lasso for grouped data in generalized linear models 广义线性模型中分组数据的广义融合拉索(Generalized fused Lasso
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-05-25 DOI: 10.1007/s11222-024-10433-5
Mineaki Ohishi

Generalized fused Lasso (GFL) is a powerful method based on adjacent relationships or the network structure of data. It is used in a number of research areas, including clustering, discrete smoothing, and spatio-temporal analysis. When applying GFL, the specific optimization method used is an important issue. In generalized linear models, efficient algorithms based on the coordinate descent method have been developed for trend filtering under the binomial and Poisson distributions. However, to apply GFL to other distributions, such as the negative binomial distribution, which is used to deal with overdispersion in the Poisson distribution, or the gamma and inverse Gaussian distributions, which are used for positive continuous data, an algorithm for each individual distribution must be developed. To unify GFL for distributions in the exponential family, this paper proposes a coordinate descent algorithm for generalized linear models. To illustrate the method, a real data example of spatio-temporal analysis is provided.

广义融合套索(GFL)是一种基于数据相邻关系或网络结构的强大方法。它被用于聚类、离散平滑和时空分析等多个研究领域。在应用广义线性模型时,所使用的具体优化方法是一个重要问题。在广义线性模型中,已经开发出基于坐标下降法的高效算法,用于二项分布和泊松分布下的趋势过滤。然而,要将 GFL 应用于其他分布,如用于处理泊松分布过度分散的负二项分布,或用于正连续数据的伽马分布和反高斯分布,就必须为每种分布开发一种算法。为了统一指数族分布的 GFL,本文提出了广义线性模型的坐标下降算法。为了说明该方法,本文提供了一个时空分析的真实数据示例。
{"title":"Generalized fused Lasso for grouped data in generalized linear models","authors":"Mineaki Ohishi","doi":"10.1007/s11222-024-10433-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10433-5","url":null,"abstract":"<p>Generalized fused Lasso (GFL) is a powerful method based on adjacent relationships or the network structure of data. It is used in a number of research areas, including clustering, discrete smoothing, and spatio-temporal analysis. When applying GFL, the specific optimization method used is an important issue. In generalized linear models, efficient algorithms based on the coordinate descent method have been developed for trend filtering under the binomial and Poisson distributions. However, to apply GFL to other distributions, such as the negative binomial distribution, which is used to deal with overdispersion in the Poisson distribution, or the gamma and inverse Gaussian distributions, which are used for positive continuous data, an algorithm for each individual distribution must be developed. To unify GFL for distributions in the exponential family, this paper proposes a coordinate descent algorithm for generalized linear models. To illustrate the method, a real data example of spatio-temporal analysis is provided.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Type I Tobit Bayesian Additive Regression Trees for censored outcome regression 用于删减结果回归的 I 类托比特贝叶斯加法回归树
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-05-24 DOI: 10.1007/s11222-024-10434-4
Eoghan O’Neill

Censoring occurs when an outcome is unobserved beyond some threshold value. Methods that do not account for censoring produce biased predictions of the unobserved outcome. This paper introduces Type I Tobit Bayesian Additive Regression Tree (TOBART-1) models for censored outcomes. Simulation results and real data applications demonstrate that TOBART-1 produces accurate predictions of censored outcomes. TOBART-1 provides posterior intervals for the conditional expectation and other quantities of interest. The error term distribution can have a large impact on the expectation of the censored outcome. Therefore, the error is flexibly modeled as a Dirichlet process mixture of normal distributions. An R package is available at https://github.com/EoghanONeill/TobitBART.

当一个结果的观测值超过某个临界值时,就会发生删减。不考虑删减的方法对未观察结果的预测会产生偏差。本文介绍了针对剔除结果的 I 类托比特贝叶斯加性回归树(TOBART-1)模型。模拟结果和实际数据应用证明,TOBART-1 能准确预测剔除结果。TOBART-1 提供了条件期望和其他相关量的后验区间。误差项的分布会对删减结果的期望值产生很大影响。因此,误差被灵活地建模为正态分布的 Dirichlet 过程混合物。R 软件包见 https://github.com/EoghanONeill/TobitBART。
{"title":"Type I Tobit Bayesian Additive Regression Trees for censored outcome regression","authors":"Eoghan O’Neill","doi":"10.1007/s11222-024-10434-4","DOIUrl":"https://doi.org/10.1007/s11222-024-10434-4","url":null,"abstract":"<p>Censoring occurs when an outcome is unobserved beyond some threshold value. Methods that do not account for censoring produce biased predictions of the unobserved outcome. This paper introduces Type I Tobit Bayesian Additive Regression Tree (TOBART-1) models for censored outcomes. Simulation results and real data applications demonstrate that TOBART-1 produces accurate predictions of censored outcomes. TOBART-1 provides posterior intervals for the conditional expectation and other quantities of interest. The error term distribution can have a large impact on the expectation of the censored outcome. Therefore, the error is flexibly modeled as a Dirichlet process mixture of normal distributions. An R package is available at https://github.com/EoghanONeill/TobitBART.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141150197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Group sparse structural smoothing recovery: model, statistical properties and algorithm 群体稀疏结构平滑恢复:模型、统计特性和算法
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-05-23 DOI: 10.1007/s11222-024-10438-0
Zuoxun Tan, Hu Yang
{"title":"Group sparse structural smoothing recovery: model, statistical properties and algorithm","authors":"Zuoxun Tan, Hu Yang","doi":"10.1007/s11222-024-10438-0","DOIUrl":"https://doi.org/10.1007/s11222-024-10438-0","url":null,"abstract":"","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141107806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics and Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1