首页 > 最新文献

Statistics and Computing最新文献

英文 中文
An efficient workflow for modelling high-dimensional spatial extremes 建立高维空间极值模型的高效工作流程
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-19 DOI: 10.1007/s11222-024-10448-y
Silius M. Vandeskog, Sara Martino, Raphaël Huser

We develop a comprehensive methodological workflow for Bayesian modelling of high-dimensional spatial extremes that lets us describe both weakening extremal dependence at increasing levels and changes in the type of extremal dependence class as a function of the distance between locations. This is achieved with a latent Gaussian version of the spatial conditional extremes model that allows for computationally efficient inference with R-INLA. Inference is made more robust using a post hoc adjustment method that accounts for possible model misspecification. This added robustness makes it possible to extract more information from the available data during inference using a composite likelihood. The developed methodology is applied to the modelling of extreme hourly precipitation from high-resolution radar data in Norway. Inference is performed quickly, and the resulting model fit successfully captures the main trends in the extremal dependence structure of the data. The post hoc adjustment is found to further improve model performance.

我们为高维空间极值的贝叶斯建模开发了一套全面的方法论工作流程,使我们既能描述极值依赖性在水平增加时的减弱,又能描述极值依赖性类型随地点间距离的变化而变化。这是通过空间条件极值模型的潜在高斯版本实现的,该模型允许使用 R-INLA 进行高效计算推断。推论采用事后调整方法,考虑到可能出现的模型规范错误,从而使推论更加稳健。由于增加了稳健性,因此在使用复合似然法进行推理时,可以从可用数据中提取更多信息。所开发的方法被应用于根据挪威的高分辨率雷达数据建立极端小时降水量模型。推理过程很快,得出的拟合模型成功捕捉到了数据极端依赖结构的主要趋势。事后调整可进一步提高模型性能。
{"title":"An efficient workflow for modelling high-dimensional spatial extremes","authors":"Silius M. Vandeskog, Sara Martino, Raphaël Huser","doi":"10.1007/s11222-024-10448-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10448-y","url":null,"abstract":"<p>We develop a comprehensive methodological workflow for Bayesian modelling of high-dimensional spatial extremes that lets us describe both weakening extremal dependence at increasing levels and changes in the type of extremal dependence class as a function of the distance between locations. This is achieved with a latent Gaussian version of the spatial conditional extremes model that allows for computationally efficient inference with <span>R-INLA</span>. Inference is made more robust using a post hoc adjustment method that accounts for possible model misspecification. This added robustness makes it possible to extract more information from the available data during inference using a composite likelihood. The developed methodology is applied to the modelling of extreme hourly precipitation from high-resolution radar data in Norway. Inference is performed quickly, and the resulting model fit successfully captures the main trends in the extremal dependence structure of the data. The post hoc adjustment is found to further improve model performance.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"39 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based clustering with missing not at random data 基于模型的非随机数据缺失聚类
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-18 DOI: 10.1007/s11222-024-10444-2
Aude Sportisse, Matthieu Marbac, Fabien Laporte, Gilles Celeux, Claire Boyer, Julie Josse, Christophe Biernacki

Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. To do so, we introduce a mixture model for different types of data (continuous, count, categorical and mixed) to jointly model the data distribution and the MNAR mechanism, remaining vigilant to the relative degrees of freedom of each. Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership. However, we focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership. We first underline its ease of estimation, by showing that the statistical inference can be carried out on the data matrix concatenated with the missing mask considering finally a standard MAR mechanism. Consequently, we propose to perform clustering using the Expectation Maximization algorithm, specially developed for this simplified reinterpretation. Finally, we assess the numerical performances of the proposed methods on synthetic data and on the real medical registry TraumaBase as well.

与任何学习任务一样,基于模型的无监督学习一旦出现数据缺失就会停滞不前。当缺失数据是有信息的,或者说是非随机缺失(MNAR)时,情况更是如此。在本文中,我们提出了基于模型的聚类算法,旨在处理一般类型的缺失数据,包括 MNAR 数据。为此,我们为不同类型的数据(连续数据、计数数据、分类数据和混合数据)引入了一个混合模型,对数据分布和 MNAR 机制进行联合建模,同时对每种数据的相对自由度保持警惕。我们讨论了几种 MNAR 模型,在这些模型中,缺失的原因既取决于缺失变量本身的值,也取决于类别成员资格。然而,我们将重点放在一个特定的 MNAR 模型上,称为 MNARz,在这个模型中,缺失率只取决于类别成员资格。我们首先强调了该模型的易估性,表明统计推断可以在数据矩阵与缺失掩码的串联上进行,并最终考虑标准 MAR 机制。因此,我们建议使用期望最大化算法进行聚类,该算法是专门为这种简化的重新解释而开发的。最后,我们评估了所提方法在合成数据和真实医疗登记 TraumaBase 上的数值表现。
{"title":"Model-based clustering with missing not at random data","authors":"Aude Sportisse, Matthieu Marbac, Fabien Laporte, Gilles Celeux, Claire Boyer, Julie Josse, Christophe Biernacki","doi":"10.1007/s11222-024-10444-2","DOIUrl":"https://doi.org/10.1007/s11222-024-10444-2","url":null,"abstract":"<p>Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. To do so, we introduce a mixture model for different types of data (continuous, count, categorical and mixed) to jointly model the data distribution and the MNAR mechanism, remaining vigilant to the relative degrees of freedom of each. Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership. However, we focus on a specific MNAR model, called MNAR<i>z</i>, for which the missingness only depends on the class membership. We first underline its ease of estimation, by showing that the statistical inference can be carried out on the data matrix concatenated with the missing mask considering finally a standard MAR mechanism. Consequently, we propose to perform clustering using the Expectation Maximization algorithm, specially developed for this simplified reinterpretation. Finally, we assess the numerical performances of the proposed methods on synthetic data and on the real medical registry TraumaBase as well.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"46 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient method to simulate diffusion bridges 模拟扩散桥的高效方法
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-12 DOI: 10.1007/s11222-024-10439-z
H. Chau, J. L. Kirkby, D. H. Nguyen, D. Nguyen, N. Nguyen, T. Nguyen

In this paper, we provide a unified approach to simulate diffusion bridges. The proposed method covers a wide range of processes including univariate and multivariate diffusions, and the diffusions can be either time-homogeneous or time-inhomogeneous. We provide a theoretical framework for the proposed method. In particular, using the parametrix representations we show that the approximated probability transition density function converges to that of the true diffusion, which in turn implies the convergence of the approximation. Unlike most of the methods proposed in the literature, our approach does not involve acceptance-rejection mechanics. That is, it is acceptance-rejection free. Extensive numerical examples are provided for illustration and demonstrate the accuracy of the proposed method.

本文提供了一种模拟扩散桥的统一方法。所提出的方法涵盖了包括单变量和多变量扩散在内的多种过程,扩散可以是时间均质的,也可以是时间非均质的。我们为提出的方法提供了一个理论框架。特别是,利用参数矩阵表示法,我们证明了近似概率过渡密度函数收敛于真实扩散的概率过渡密度函数,这反过来又意味着近似的收敛性。与文献中提出的大多数方法不同,我们的方法不涉及接受-拒绝力学。也就是说,它不涉及接受排斥。我们提供了大量的数值示例进行说明,并证明了所提方法的准确性。
{"title":"An efficient method to simulate diffusion bridges","authors":"H. Chau, J. L. Kirkby, D. H. Nguyen, D. Nguyen, N. Nguyen, T. Nguyen","doi":"10.1007/s11222-024-10439-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10439-z","url":null,"abstract":"<p>In this paper, we provide a unified approach to simulate diffusion bridges. The proposed method covers a wide range of processes including univariate and multivariate diffusions, and the diffusions can be either time-homogeneous or time-inhomogeneous. We provide a theoretical framework for the proposed method. In particular, using the parametrix representations we show that the approximated probability transition density function converges to that of the true diffusion, which in turn implies the convergence of the approximation. Unlike most of the methods proposed in the literature, our approach does not involve acceptance-rejection mechanics. That is, it is acceptance-rejection free. Extensive numerical examples are provided for illustration and demonstrate the accuracy of the proposed method.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"17 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient estimation and correction of selection-induced bias with order statistics 利用阶次统计有效估计和修正选择诱导偏差
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-12 DOI: 10.1007/s11222-024-10442-4
Yann McLatchie, Aki Vehtari

Model selection aims to identify a sufficiently well performing model that is possibly simpler than the most complex model among a pool of candidates. However, the decision-making process itself can inadvertently introduce non-negligible bias when the cross-validation estimates of predictive performance are marred by excessive noise. In finite data regimes, cross-validated estimates can encourage the statistician to select one model over another when it is not actually better for future data. While this bias remains negligible in the case of few models, when the pool of candidates grows, and model selection decisions are compounded (as in step-wise selection), the expected magnitude of selection-induced bias is likely to grow too. This paper introduces an efficient approach to estimate and correct selection-induced bias based on order statistics. Numerical experiments demonstrate the reliability of our approach in estimating both selection-induced bias and over-fitting along compounded model selection decisions, with specific application to forward search. This work represents a light-weight alternative to more computationally expensive approaches to correcting selection-induced bias, such as nested cross-validation and the bootstrap. Our approach rests on several theoretic assumptions, and we provide a diagnostic to help understand when these may not be valid and when to fall back on safer, albeit more computationally expensive approaches. The accompanying code facilitates its practical implementation and fosters further exploration in this area.

模型选择的目的是找出一个性能足够好的模型,它可能比候选模型库中最复杂的模型更简单。然而,当预测性能的交叉验证估计值受到过多噪声的影响时,决策过程本身可能会无意中引入不可忽略的偏差。在有限数据环境中,交叉验证估计值会鼓励统计学家选择一个模型而不是另一个模型,而实际上这个模型对未来数据并没有更好的预测效果。虽然在模型数量较少的情况下,这种偏差仍然可以忽略不计,但当候选模型库不断扩大,模型选择决策不断复合(如逐步选择)时,由选择引起的偏差的预期幅度也可能随之增大。本文介绍了一种基于阶次统计估计和纠正选择诱导偏差的有效方法。数值实验证明了我们的方法在估计复合模型选择决策的选择诱导偏差和过拟合方面的可靠性,并具体应用于前向搜索。这项工作是一种轻量级方法,可替代计算成本较高的方法来纠正选择诱导偏差,如嵌套交叉验证和自举法。我们的方法建立在几个理论假设的基础上,我们提供了一种诊断方法,以帮助理解这些假设何时可能无效,以及何时应退而求其次采用更安全、但计算成本更高的方法。随附的代码有助于该方法的实际应用,并促进该领域的进一步探索。
{"title":"Efficient estimation and correction of selection-induced bias with order statistics","authors":"Yann McLatchie, Aki Vehtari","doi":"10.1007/s11222-024-10442-4","DOIUrl":"https://doi.org/10.1007/s11222-024-10442-4","url":null,"abstract":"<p>Model selection aims to identify a sufficiently well performing model that is possibly simpler than the most complex model among a pool of candidates. However, the decision-making process itself can inadvertently introduce non-negligible bias when the cross-validation estimates of predictive performance are marred by excessive noise. In finite data regimes, cross-validated estimates can encourage the statistician to select one model over another when it is not actually better for future data. While this bias remains negligible in the case of few models, when the pool of candidates grows, and model selection decisions are compounded (as in step-wise selection), the expected magnitude of selection-induced bias is likely to grow too. This paper introduces an efficient approach to estimate and correct selection-induced bias based on order statistics. Numerical experiments demonstrate the reliability of our approach in estimating both selection-induced bias and over-fitting along compounded model selection decisions, with specific application to forward search. This work represents a light-weight alternative to more computationally expensive approaches to correcting selection-induced bias, such as nested cross-validation and the bootstrap. Our approach rests on several theoretic assumptions, and we provide a diagnostic to help understand when these may not be valid and when to fall back on safer, albeit more computationally expensive approaches. The accompanying code facilitates its practical implementation and fosters further exploration in this area.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"24 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Jittering and clustering: strategies for the construction of robust designs 抖动和聚类:构建稳健设计的策略
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-04 DOI: 10.1007/s11222-024-10436-2
Douglas P. Wiens

We discuss, and give examples of, methods for randomly implementing some minimax robust designs from the literature. These have the advantage, over their deterministic counterparts, of having bounded maximum loss in large and very rich neighbourhoods of the, almost certainly inexact, response model fitted by the experimenter. Their maximum loss rivals that of the theoretically best possible, but not implementable, minimax designs. The procedures are then extended to more general robust designs. For two-dimensional designs we sample from contractions of Voronoi tessellations, generated by selected basis points, which partition the design space. These ideas are then extended to k-dimensional designs for general k.

我们将讨论并举例说明随机实施文献中某些最小稳健设计的方法。与确定性设计相比,这些设计的优点是在实验者拟合的反应模型(几乎可以肯定是不精确的)的大而丰富的邻域内具有有界最大损失。它们的最大损失可与理论上最佳但无法实施的最小设计相媲美。然后,我们将程序扩展到更一般的稳健设计。对于二维设计,我们从由选定基点生成的 Voronoi 网格收缩中进行采样,从而分割设计空间。然后,我们将这些想法扩展到一般 k 的 k 维设计。
{"title":"Jittering and clustering: strategies for the construction of robust designs","authors":"Douglas P. Wiens","doi":"10.1007/s11222-024-10436-2","DOIUrl":"https://doi.org/10.1007/s11222-024-10436-2","url":null,"abstract":"<p>We discuss, and give examples of, methods for randomly implementing some minimax robust designs from the literature. These have the advantage, over their deterministic counterparts, of having bounded maximum loss in large and very rich neighbourhoods of the, almost certainly inexact, response model fitted by the experimenter. Their maximum loss rivals that of the theoretically best possible, but not implementable, minimax designs. The procedures are then extended to more general robust designs. For two-dimensional designs we sample from contractions of Voronoi tessellations, generated by selected basis points, which partition the design space. These ideas are then extended to <i>k</i>-dimensional designs for general <i>k</i>.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"418 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141259028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing the goodness-of-fit of the stable distributions with applications to German stock index data and Bitcoin cryptocurrency data 应用德国股票指数数据和比特币加密货币数据检验稳定分布的拟合优度
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-03 DOI: 10.1007/s11222-024-10441-5
Ruhul Ali Khan, Ayan Pal, Debasis Kundu

Outlier-prone data sets are of immense interest in diverse areas including economics, finance, statistical physics, signal processing, telecommunications and so on. Stable laws (also known as (alpha )- stable laws) are often found to be useful in modeling outlier-prone data containing important information and exhibiting heavy tailed phenomenon. In this article, an asymptotic distribution of a unbiased and consistent estimator of the stability index (alpha ) is proposed based on jackknife empirical likelihood (JEL) and adjusted JEL method. Next, using the sum-preserving property of stable random variables and exploiting U-statistic theory, we have developed a goodness-of-fit test procedure for (alpha )-stable distributions where the stability index (alpha ) is specified. Extensive simulation studies are performed in order to assess the finite sample performance of the proposed test. Finally, two appealing real life data examples related to the daily closing price of German Stock Index and Bitcoin cryptocurrency are analysed in detail for illustration purposes.

离群值数据集在经济、金融、统计物理、信号处理、电信等多个领域都有着巨大的意义。稳定规律(也称为 (α)- 稳定规律)经常被用来模拟包含重要信息并表现出重尾现象的离群易变数据。本文基于杰克刀经验似然法(JEL)和调整JEL法,提出了稳定指数(alpha )的无偏一致估计值的渐近分布。接下来,我们利用稳定随机变量的保和性并利用 U 统计理论,为指定了稳定指数 ()的 (α )-稳定分布建立了拟合优度检验程序。为了评估所提出的测试的有限样本性能,进行了广泛的模拟研究。最后,为了说明问题,详细分析了与德国股票指数和比特币加密货币每日收盘价相关的两个有吸引力的现实生活数据示例。
{"title":"Testing the goodness-of-fit of the stable distributions with applications to German stock index data and Bitcoin cryptocurrency data","authors":"Ruhul Ali Khan, Ayan Pal, Debasis Kundu","doi":"10.1007/s11222-024-10441-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10441-5","url":null,"abstract":"<p>Outlier-prone data sets are of immense interest in diverse areas including economics, finance, statistical physics, signal processing, telecommunications and so on. Stable laws (also known as <span>(alpha )</span>- stable laws) are often found to be useful in modeling outlier-prone data containing important information and exhibiting heavy tailed phenomenon. In this article, an asymptotic distribution of a unbiased and consistent estimator of the stability index <span>(alpha )</span> is proposed based on jackknife empirical likelihood (JEL) and adjusted JEL method. Next, using the sum-preserving property of stable random variables and exploiting <i>U</i>-statistic theory, we have developed a goodness-of-fit test procedure for <span>(alpha )</span>-stable distributions where the stability index <span>(alpha )</span> is specified. Extensive simulation studies are performed in order to assess the finite sample performance of the proposed test. Finally, two appealing real life data examples related to the daily closing price of German Stock Index and Bitcoin cryptocurrency are analysed in detail for illustration purposes.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"75 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141259103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Insufficient Gibbs sampling 吉布斯采样不足
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-31 DOI: 10.1007/s11222-024-10423-7
Antoine Luciano, Christian P. Robert, Robin J. Ryder

In some applied scenarios, the availability of complete data is restricted, often due to privacy concerns; only aggregated, robust and inefficient statistics derived from the data are made accessible. These robust statistics are not sufficient, but they demonstrate reduced sensitivity to outliers and offer enhanced data protection due to their higher breakdown point. We consider a parametric framework and propose a method to sample from the posterior distribution of parameters conditioned on various robust and inefficient statistics: specifically, the pairs (median, MAD) or (median, IQR), or a collection of quantiles. Our approach leverages a Gibbs sampler and simulates latent augmented data, which facilitates simulation from the posterior distribution of parameters belonging to specific families of distributions. A by-product of these samples from the joint posterior distribution of parameters and data given the observed statistics is that we can estimate Bayes factors based on observed statistics via bridge sampling. We validate and outline the limitations of the proposed methods through toy examples and an application to real-world income data.

在某些应用场景中,往往出于隐私考虑,完整数据的可用性受到限制;只能获取从数据中得出的汇总、稳健和低效统计数据。这些稳健的统计数据并不充分,但它们对异常值的敏感度较低,而且由于其击穿点较高,可提供更强的数据保护。我们考虑了一个参数框架,并提出了一种从参数的后验分布中进行采样的方法,其条件是各种稳健和低效统计量:具体来说,就是成对的(中位数、MAD)或(中位数、IQR),或一组量值。我们的方法利用吉布斯采样器并模拟潜在的增强数据,这有助于从属于特定分布系列的参数后验分布中进行模拟。从参数和数据的联合后验分布(给定观测统计量)中采样的一个副产品是,我们可以通过桥采样根据观测统计量估算贝叶斯系数。我们通过玩具示例和对现实世界收入数据的应用,验证并概述了所提方法的局限性。
{"title":"Insufficient Gibbs sampling","authors":"Antoine Luciano, Christian P. Robert, Robin J. Ryder","doi":"10.1007/s11222-024-10423-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10423-7","url":null,"abstract":"<p>In some applied scenarios, the availability of complete data is restricted, often due to privacy concerns; only aggregated, robust and inefficient statistics derived from the data are made accessible. These robust statistics are not sufficient, but they demonstrate reduced sensitivity to outliers and offer enhanced data protection due to their higher breakdown point. We consider a parametric framework and propose a method to sample from the posterior distribution of parameters conditioned on various robust and inefficient statistics: specifically, the pairs (median, MAD) or (median, IQR), or a collection of quantiles. Our approach leverages a Gibbs sampler and simulates latent augmented data, which facilitates simulation from the posterior distribution of parameters belonging to specific families of distributions. A by-product of these samples from the joint posterior distribution of parameters and data given the observed statistics is that we can estimate Bayes factors based on observed statistics via bridge sampling. We validate and outline the limitations of the proposed methods through toy examples and an application to real-world income data.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"94 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141190162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimization of the generalized covariance estimator in noncausal processes 非因果过程中广义协方差估计器的优化
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-31 DOI: 10.1007/s11222-024-10437-1
Gianluca Cubadda, Francesco Giancaterini, Alain Hecq, Joann Jasiak

This paper investigates the performance of routinely used optimization algorithms in application to the Generalized Covariance estimator (GCov) for univariate and multivariate mixed causal and noncausal models. The GCov is a semi-parametric estimator with an objective function based on nonlinear autocovariances to identify causal and noncausal orders. When the number and type of nonlinear autocovariances included in the objective function are insufficient/inadequate, or the error density is too close to the Gaussian, identification issues can arise. These issues result in local minima in the objective function, which correspond to parameter values associated with incorrect causal and noncausal orders. Then, depending on the starting point and the optimization algorithm employed, the algorithm can converge to a local minimum. The paper proposes the Simulated Annealing (SA) optimization algorithm as an alternative to conventional numerical optimization methods. The results demonstrate that SA performs well in its application to mixed causal and noncausal models, successfully eliminating the effects of local minima. The proposed approach is illustrated by an empirical study of a bivariate series of commodity prices.

本文研究了常规优化算法在单变量和多变量混合因果和非因果模型的广义协方差估计器(GCov)应用中的性能。GCov 是一种半参数估计器,其目标函数基于非线性自变量,用于识别因果和非因果阶次。当目标函数中包含的非线性自变量的数量和类型不足/不充分,或误差密度过于接近高斯时,就会出现识别问题。这些问题会导致目标函数出现局部极小值,而局部极小值与不正确的因果和非因果阶次相关的参数值相对应。然后,根据起点和所采用的优化算法,算法会收敛到局部最小值。本文提出了模拟退火(SA)优化算法,以替代传统的数值优化方法。结果表明,SA 在应用于混合因果和非因果模型时表现良好,成功消除了局部最小值的影响。通过对商品价格二元序列的实证研究,对所提出的方法进行了说明。
{"title":"Optimization of the generalized covariance estimator in noncausal processes","authors":"Gianluca Cubadda, Francesco Giancaterini, Alain Hecq, Joann Jasiak","doi":"10.1007/s11222-024-10437-1","DOIUrl":"https://doi.org/10.1007/s11222-024-10437-1","url":null,"abstract":"<p>This paper investigates the performance of routinely used optimization algorithms in application to the Generalized Covariance estimator (<i>GCov</i>) for univariate and multivariate mixed causal and noncausal models. The <i>GCov</i> is a semi-parametric estimator with an objective function based on nonlinear autocovariances to identify causal and noncausal orders. When the number and type of nonlinear autocovariances included in the objective function are insufficient/inadequate, or the error density is too close to the Gaussian, identification issues can arise. These issues result in local minima in the objective function, which correspond to parameter values associated with incorrect causal and noncausal orders. Then, depending on the starting point and the optimization algorithm employed, the algorithm can converge to a local minimum. The paper proposes the Simulated Annealing (SA) optimization algorithm as an alternative to conventional numerical optimization methods. The results demonstrate that SA performs well in its application to mixed causal and noncausal models, successfully eliminating the effects of local minima. The proposed approach is illustrated by an empirical study of a bivariate series of commodity prices.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"2010 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141190197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A modified EM-type algorithm to estimate semi-parametric mixtures of non-parametric regressions 估计非参数回归半参数混合物的改进型 EM 算法
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-29 DOI: 10.1007/s11222-024-10435-3
Sphiwe B. Skhosana, Salomon M. Millard, Frans H. J. Kanfer

Semi-parametric Gaussian mixtures of non-parametric regressions (SPGMNRs) are a flexible extension of Gaussian mixtures of linear regressions (GMLRs). The model assumes that the component regression functions (CRFs) are non-parametric functions of the covariate(s) whereas the component mixing proportions and variances are constants. Unfortunately, the model cannot be reliably estimated using traditional methods. A local-likelihood approach for estimating the CRFs requires that we maximize a set of local-likelihood functions. Using the Expectation-Maximization (EM) algorithm to separately maximize each local-likelihood function may lead to label-switching. This is because the posterior probabilities calculated at the local E-step are not guaranteed to be aligned. The consequence of this label-switching is wiggly and non-smooth estimates of the CRFs. In this paper, we propose a unified approach to address label-switching and obtain sensible estimates. The proposed approach has two stages. In the first stage, we propose a model-based approach to address the label-switching problem. We first note that each local-likelihood function is a likelihood function of a Gaussian mixture model (GMM). Next, we reformulate the SPGMNRs model as a mixture of these GMMs. Lastly, using a modified version of the Expectation Conditional Maximization (ECM) algorithm, we estimate the mixture of GMMs. In addition, using the mixing weights of the local GMMs, we can automatically choose the local points where local-likelihood estimation takes place. In the second stage, we propose one-step backfitting estimates of the parametric and non-parametric terms. The effectiveness of the proposed approach is demonstrated on simulated data and real data analysis.

非参数回归半参数高斯混合物(SPGMNRs)是线性回归高斯混合物(GMLRs)的灵活扩展。该模型假设成分回归函数(CRF)是协变量的非参数函数,而成分混合比例和方差是常数。遗憾的是,使用传统方法无法可靠地估计该模型。估计 CRF 的局部似然法要求我们最大化一组局部似然函数。使用期望最大化(EM)算法分别最大化每个局部似然函数可能会导致标签切换。这是因为在局部 E 步计算出的后验概率不能保证一致。这种标签切换的后果是 CRF 的估计值摇摆不定且不平滑。在本文中,我们提出了一种统一的方法来解决标签切换问题,并获得合理的估计值。我们提出的方法分为两个阶段。在第一阶段,我们提出了一种基于模型的方法来解决标签切换问题。我们首先指出,每个局部似然函数都是高斯混合模型(GMM)的似然函数。接下来,我们将 SPGMNRs 模型重新表述为这些 GMM 的混合物。最后,我们使用改进版的期望条件最大化(ECM)算法来估计 GMM 混合物。此外,利用局部 GMM 的混合权重,我们可以自动选择进行局部似然估计的局部点。在第二阶段,我们提出了参数和非参数项的一步反拟合估计。我们通过模拟数据和实际数据分析证明了所提方法的有效性。
{"title":"A modified EM-type algorithm to estimate semi-parametric mixtures of non-parametric regressions","authors":"Sphiwe B. Skhosana, Salomon M. Millard, Frans H. J. Kanfer","doi":"10.1007/s11222-024-10435-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10435-3","url":null,"abstract":"<p>Semi-parametric Gaussian mixtures of non-parametric regressions (SPGMNRs) are a flexible extension of Gaussian mixtures of linear regressions (GMLRs). The model assumes that the component regression functions (CRFs) are non-parametric functions of the covariate(s) whereas the component mixing proportions and variances are constants. Unfortunately, the model cannot be reliably estimated using traditional methods. A local-likelihood approach for estimating the CRFs requires that we maximize a set of local-likelihood functions. Using the Expectation-Maximization (EM) algorithm to separately maximize each local-likelihood function may lead to label-switching. This is because the posterior probabilities calculated at the local E-step are not guaranteed to be aligned. The consequence of this label-switching is wiggly and non-smooth estimates of the CRFs. In this paper, we propose a unified approach to address label-switching and obtain sensible estimates. The proposed approach has two stages. In the first stage, we propose a model-based approach to address the label-switching problem. We first note that each local-likelihood function is a likelihood function of a Gaussian mixture model (GMM). Next, we reformulate the SPGMNRs model as a mixture of these GMMs. Lastly, using a modified version of the Expectation Conditional Maximization (ECM) algorithm, we estimate the mixture of GMMs. In addition, using the mixing weights of the local GMMs, we can automatically choose the local points where local-likelihood estimation takes place. In the second stage, we propose one-step backfitting estimates of the parametric and non-parametric terms. The effectiveness of the proposed approach is demonstrated on simulated data and real data analysis.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"62 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141166408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized fused Lasso for grouped data in generalized linear models 广义线性模型中分组数据的广义融合拉索(Generalized fused Lasso
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-25 DOI: 10.1007/s11222-024-10433-5
Mineaki Ohishi

Generalized fused Lasso (GFL) is a powerful method based on adjacent relationships or the network structure of data. It is used in a number of research areas, including clustering, discrete smoothing, and spatio-temporal analysis. When applying GFL, the specific optimization method used is an important issue. In generalized linear models, efficient algorithms based on the coordinate descent method have been developed for trend filtering under the binomial and Poisson distributions. However, to apply GFL to other distributions, such as the negative binomial distribution, which is used to deal with overdispersion in the Poisson distribution, or the gamma and inverse Gaussian distributions, which are used for positive continuous data, an algorithm for each individual distribution must be developed. To unify GFL for distributions in the exponential family, this paper proposes a coordinate descent algorithm for generalized linear models. To illustrate the method, a real data example of spatio-temporal analysis is provided.

广义融合套索(GFL)是一种基于数据相邻关系或网络结构的强大方法。它被用于聚类、离散平滑和时空分析等多个研究领域。在应用广义线性模型时,所使用的具体优化方法是一个重要问题。在广义线性模型中,已经开发出基于坐标下降法的高效算法,用于二项分布和泊松分布下的趋势过滤。然而,要将 GFL 应用于其他分布,如用于处理泊松分布过度分散的负二项分布,或用于正连续数据的伽马分布和反高斯分布,就必须为每种分布开发一种算法。为了统一指数族分布的 GFL,本文提出了广义线性模型的坐标下降算法。为了说明该方法,本文提供了一个时空分析的真实数据示例。
{"title":"Generalized fused Lasso for grouped data in generalized linear models","authors":"Mineaki Ohishi","doi":"10.1007/s11222-024-10433-5","DOIUrl":"https://doi.org/10.1007/s11222-024-10433-5","url":null,"abstract":"<p>Generalized fused Lasso (GFL) is a powerful method based on adjacent relationships or the network structure of data. It is used in a number of research areas, including clustering, discrete smoothing, and spatio-temporal analysis. When applying GFL, the specific optimization method used is an important issue. In generalized linear models, efficient algorithms based on the coordinate descent method have been developed for trend filtering under the binomial and Poisson distributions. However, to apply GFL to other distributions, such as the negative binomial distribution, which is used to deal with overdispersion in the Poisson distribution, or the gamma and inverse Gaussian distributions, which are used for positive continuous data, an algorithm for each individual distribution must be developed. To unify GFL for distributions in the exponential family, this paper proposes a coordinate descent algorithm for generalized linear models. To illustrate the method, a real data example of spatio-temporal analysis is provided.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"17 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics and Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1