首页 > 最新文献

Computational Statistics最新文献

英文 中文
Sparse Bayesian multidimensional scaling(s). 稀疏贝叶斯多维尺度。
IF 1.4 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2026-01-01 Epub Date: 2025-12-24 DOI: 10.1007/s00180-025-01696-1
Ami Sheth, Aaron Smith, Andrew J Holbrook

Bayesian multidimensional scaling (BMDS) is a probabilistic dimension reduction tool that allows one to model and visualize data consisting of dissimilarities between pairs of objects. Although BMDS has proven useful within, e.g., Bayesian phylogenetic inference, its likelihood and gradient calculations require burdensome [Formula: see text] floating-point operations, where N is the number of data points. Thus, BMDS becomes impractical as N grows large. We propose and compare two sparse versions of BMDS (sBMDS) that apply log-likelihood and gradient computations to subsets of the observed dissimilarity matrix data. Landmark sBMDS (L-sBMDS) extracts columns, while banded sBMDS (B-sBMDS) extracts diagonals of the data. These sparse variants let one specify a time complexity between [Formula: see text] and N. Under simplified settings, we prove posterior consistency for subsampled distance matrices. Through simulations, we examine the accuracy and computational efficiency across all models using both the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms. We observe approximately 3-fold, 10-fold and 40-fold speedups with negligible loss of accuracy, when applying the sBMDS likelihoods and gradients to 500, 1000 and 5,000 data points with 50 bands (landmarks); these speedups only increase with the size of data considered. Finally, we apply the sBMDS variants to: (1) the phylogeographic modeling of multiple influenza subtypes to better understand how these strains spread through global air transportation networks and (2) the clustering of ArXiv manuscripts based on low-dimensional representations of article abstracts. In the first application, sBMDS contributes to holistic uncertainty quantification within a larger Bayesian hierarchical model. In the second, sBMDS approximates uncertainty quantification for a downstream modeling task.

贝叶斯多维缩放(BMDS)是一种概率降维工具,它允许对由对象对之间的不相似性组成的数据进行建模和可视化。尽管BMDS已被证明在贝叶斯系统发育推断中很有用,但它的似然和梯度计算需要繁琐的浮点运算,其中N是数据点的数量。因此,随着N的增大,BMDS变得不切实际。我们提出并比较了两种稀疏版本的BMDS (sBMDS),它们将对数似然和梯度计算应用于观察到的不相似矩阵数据的子集。Landmark sBMDS (L-sBMDS)提取列,带状sBMDS (B-sBMDS)提取数据的对角线。这些稀疏变体允许指定[公式:见文本]和n之间的时间复杂度。在简化设置下,我们证明了下采样距离矩阵的后验一致性。通过模拟,我们使用Metropolis-Hastings和hamilton蒙特卡洛算法检查了所有模型的准确性和计算效率。当将sBMDS似然和梯度应用于500、1000和5000个数据点,50个波段(地标)时,我们观察到大约3倍、10倍和40倍的速度,精度损失可以忽略不计;这些加速只会随着所考虑的数据大小而增加。最后,我们将sBMDS变体应用于:(1)多种流感亚型的系统地理建模,以更好地了解这些菌株如何通过全球航空运输网络传播;(2)基于文章摘要的低维表示对ArXiv手稿进行聚类。在第一个应用中,sBMDS有助于在更大的贝叶斯层次模型中进行整体不确定性量化。其次,sBMDS近似于下游建模任务的不确定性量化。
{"title":"Sparse Bayesian multidimensional scaling(s).","authors":"Ami Sheth, Aaron Smith, Andrew J Holbrook","doi":"10.1007/s00180-025-01696-1","DOIUrl":"10.1007/s00180-025-01696-1","url":null,"abstract":"<p><p>Bayesian multidimensional scaling (BMDS) is a probabilistic dimension reduction tool that allows one to model and visualize data consisting of dissimilarities between pairs of objects. Although BMDS has proven useful within, e.g., Bayesian phylogenetic inference, its likelihood and gradient calculations require burdensome [Formula: see text] floating-point operations, where <i>N</i> is the number of data points. Thus, BMDS becomes impractical as <i>N</i> grows large. We propose and compare two sparse versions of BMDS (sBMDS) that apply log-likelihood and gradient computations to subsets of the observed dissimilarity matrix data. Landmark sBMDS (L-sBMDS) extracts columns, while banded sBMDS (B-sBMDS) extracts diagonals of the data. These sparse variants let one specify a time complexity between [Formula: see text] and <i>N</i>. Under simplified settings, we prove posterior consistency for subsampled distance matrices. Through simulations, we examine the accuracy and computational efficiency across all models using both the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms. We observe approximately 3-fold, 10-fold and 40-fold speedups with negligible loss of accuracy, when applying the sBMDS likelihoods and gradients to 500, 1000 and 5,000 data points with 50 bands (landmarks); these speedups only increase with the size of data considered. Finally, we apply the sBMDS variants to: (1) the phylogeographic modeling of multiple influenza subtypes to better understand how these strains spread through global air transportation networks and (2) the clustering of ArXiv manuscripts based on low-dimensional representations of article abstracts. In the first application, sBMDS contributes to holistic uncertainty quantification within a larger Bayesian hierarchical model. In the second, sBMDS approximates uncertainty quantification for a downstream modeling task.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"41 1","pages":"12"},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12738595/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145851501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate Bayesian inference in a model for self-generated gradient collective cell movement. 自生成梯度集体细胞运动模型中的近似贝叶斯推理。
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-01-01 Epub Date: 2025-03-08 DOI: 10.1007/s00180-025-01606-5
Jon Devlin, Agnieszka Borowska, Dirk Husmeier, John Mackenzie

In this article we explore parameter inference in a novel hybrid discrete-continuum model describing the movement of a population of cells in response to a self-generated chemotactic gradient. The model employs a drift-diffusion stochastic process, rendering likelihood-based inference methods impractical. Consequently, we consider approximate Bayesian computation (ABC) methods, which have gained popularity for models with intractable or computationally expensive likelihoods. ABC involves simulating from the generative model, using parameters from generated observations that are "close enough" to the true data to approximate the posterior distribution. Given the plethora of existing ABC methods, selecting the most suitable one for a specific problem can be challenging. To address this, we employ a simple drift-diffusion stochastic differential equation (SDE) as a benchmark problem. This allows us to assess the accuracy of popular ABC algorithms under known configurations. We also evaluate the bias between ABC-posteriors and the exact posterior for the basic SDE model, where the posterior distribution is tractable. The top-performing ABC algorithms are subsequently applied to the proposed cell movement model to infer its key parameters. This study not only contributes to understanding cell movement but also sheds light on the comparative efficiency of different ABC algorithms in a well-defined context.

在本文中,我们探讨了一种新的混合离散连续模型中的参数推理,该模型描述了一群细胞响应自生成的趋化梯度的运动。该模型采用漂移扩散随机过程,使得基于似然的推理方法不可行。因此,我们考虑近似贝叶斯计算(ABC)方法,这种方法在具有难以处理或计算昂贵的可能性的模型中得到了普及。ABC包括从生成模型中进行模拟,使用从生成的观测数据中“足够接近”真实数据的参数来近似后验分布。鉴于现有的ABC方法过多,为特定问题选择最合适的方法可能具有挑战性。为了解决这个问题,我们采用一个简单的漂移-扩散随机微分方程(SDE)作为基准问题。这使我们能够评估在已知配置下流行的ABC算法的准确性。我们还评估了基本SDE模型的abc -后验和精确后验之间的偏差,其中后验分布是可处理的。随后将表现最好的ABC算法应用于所提出的细胞运动模型,以推断其关键参数。这项研究不仅有助于理解细胞运动,而且还揭示了在明确定义的背景下不同ABC算法的比较效率。
{"title":"Approximate Bayesian inference in a model for self-generated gradient collective cell movement.","authors":"Jon Devlin, Agnieszka Borowska, Dirk Husmeier, John Mackenzie","doi":"10.1007/s00180-025-01606-5","DOIUrl":"10.1007/s00180-025-01606-5","url":null,"abstract":"<p><p>In this article we explore parameter inference in a novel hybrid discrete-continuum model describing the movement of a population of cells in response to a self-generated chemotactic gradient. The model employs a drift-diffusion stochastic process, rendering likelihood-based inference methods impractical. Consequently, we consider approximate Bayesian computation (ABC) methods, which have gained popularity for models with intractable or computationally expensive likelihoods. ABC involves simulating from the generative model, using parameters from generated observations that are \"close enough\" to the true data to approximate the posterior distribution. Given the plethora of existing ABC methods, selecting the most suitable one for a specific problem can be challenging. To address this, we employ a simple drift-diffusion stochastic differential equation (SDE) as a benchmark problem. This allows us to assess the accuracy of popular ABC algorithms under known configurations. We also evaluate the bias between ABC-posteriors and the exact posterior for the basic SDE model, where the posterior distribution is tractable. The top-performing ABC algorithms are subsequently applied to the proposed cell movement model to infer its key parameters. This study not only contributes to understanding cell movement but also sheds light on the comparative efficiency of different ABC algorithms in a well-defined context.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"40 7","pages":"3399-3452"},"PeriodicalIF":1.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12255578/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144638687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A powerful penalized multinomial logistic regression approach. 一个强大的惩罚多项式逻辑回归方法。
IF 1.4 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-01-01 Epub Date: 2025-05-25 DOI: 10.1007/s00180-025-01635-0
Cornelia Fuetterer, Malte Nalenz, Thomas Augustin, Ruth M Pfeiffer

Penalized regression methods that shrink model coefficients are popular approaches to improve prediction and for variable selection in high-dimensional settings. We present a penalized (or regularized) regression approach for multinomial logistic models for categorical outcomes with a novel adaptive L1-type penalty term, that incorporates weights based on intra- and inter-outcome category distances of each predictor. A predictor that has large between- and small within-outcome category distances is penalized less and has a higher likelihood to be selected for the final model. We propose and study three measures for weight calculation: an analysis of variance (ANOVA)-based measure and two indices used in clustering approaches. Our novel approach, that we term the discriminative power lasso (DP-lasso), thus combines elements of marginal screening with regularized regression methods. We studied the performance of DP-lasso and other published methods in simulations with varying numbers of outcome categories, numbers of predictors, strengths of associations and predictor correlation structures. For correlated predictors, the DP-lasso approach with ANOVA based weights (DPan) resulted in much sparser models than other regularization approaches, especially in high-dimensional settings. When the number p of (correlated) predictors was much larger than the available sample size N, DPan had the highest true positive rate while maintaining low false positive rates for all simulation settings. Similarly, when p < N , DPan had high true positive rates and the lowest false positive rates of all methods studied. Thus we recommend DPan for analysing categorical outcomes in relation to high-dimensional predictors. We further illustrate all approaches in ultra high-dimensional settings, using several single-cell RNA-sequencing datasets.

Supplementary information: The online version contains supplementary material available at 10.1007/s00180-025-01635-0.

缩小模型系数的惩罚回归方法是在高维环境中改进预测和变量选择的常用方法。我们提出了一种针对分类结果的多项逻辑模型的惩罚(或正则化)回归方法,该方法具有新颖的自适应l1型惩罚项,该方法结合了基于每个预测器的结果类别内和类别间距离的权重。结果类别间距离较大和结果类别内距离较小的预测器受到的惩罚较小,并且有更高的可能性被选择为最终模型。我们提出并研究了权重计算的三种度量:基于方差分析(ANOVA)的度量和用于聚类方法的两个指标。我们的新方法,我们称之为判别力套索(dp -套索),因此结合了边际筛选和正则化回归方法的元素。我们研究了DP-lasso和其他已发表的方法在不同结果类别数量、预测因子数量、关联强度和预测因子相关结构的模拟中的性能。对于相关预测因子,基于方差分析的DP-lasso方法(DPan)比其他正则化方法产生更稀疏的模型,特别是在高维设置中。当(相关)预测因子的数量p远远大于可用样本量N时,DPan具有最高的真阳性率,同时在所有模拟设置中保持较低的假阳性率。同样,当p N时,DPan具有较高的真阳性率和最低的假阳性率。因此,我们推荐DPan用于分析与高维预测因子相关的分类结果。我们使用几个单细胞rna测序数据集进一步说明了超高维设置中的所有方法。补充信息:在线版本包含补充资料,可在10.1007/s00180-025-01635-0获得。
{"title":"A powerful penalized multinomial logistic regression approach.","authors":"Cornelia Fuetterer, Malte Nalenz, Thomas Augustin, Ruth M Pfeiffer","doi":"10.1007/s00180-025-01635-0","DOIUrl":"10.1007/s00180-025-01635-0","url":null,"abstract":"<p><p>Penalized regression methods that shrink model coefficients are popular approaches to improve prediction and for variable selection in high-dimensional settings. We present a penalized (or regularized) regression approach for multinomial logistic models for categorical outcomes with a novel adaptive L1-type penalty term, that incorporates weights based on intra- and inter-outcome category distances of each predictor. A predictor that has large between- and small within-outcome category distances is penalized less and has a higher likelihood to be selected for the final model. We propose and study three measures for weight calculation: an analysis of variance (ANOVA)-based measure and two indices used in clustering approaches. Our novel approach, that we term the <i>discriminative power lasso</i> (DP-lasso), thus combines elements of marginal screening with regularized regression methods. We studied the performance of DP-lasso and other published methods in simulations with varying numbers of outcome categories, numbers of predictors, strengths of associations and predictor correlation structures. For correlated predictors, the DP-lasso approach with ANOVA based weights (DPan) resulted in much sparser models than other regularization approaches, especially in high-dimensional settings. When the number <i>p</i> of (correlated) predictors was much larger than the available sample size <i>N</i>, DPan had the highest true positive rate while maintaining low false positive rates for all simulation settings. Similarly, when <math><mrow><mi>p</mi> <mo><</mo> <mi>N</mi></mrow> </math> , DPan had high true positive rates and the lowest false positive rates of all methods studied. Thus we recommend DPan for analysing categorical outcomes in relation to high-dimensional predictors. We further illustrate all approaches in ultra high-dimensional settings, using several single-cell RNA-sequencing datasets.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s00180-025-01635-0.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"40 8","pages":"4565-4587"},"PeriodicalIF":1.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12552268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Misspecification-robust likelihood-free inference in high dimensions. 错误说明-高维鲁棒无似然推断。
IF 1.4 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-01-01 Epub Date: 2025-05-03 DOI: 10.1007/s00180-025-01607-4
Owen Thomas, Raquel Sá-Leão, Hermínia de Lencastre, Samuel Kaski, Jukka Corander, Henri Pesonen

Likelihood-free inference for simulator-based statistical models has developed rapidly from its infancy to a useful tool for practitioners. However, models with more than a handful of parameters still generally remain a challenge for the Approximate Bayesian Computation (ABC) based inference. To advance the possibilities for performing likelihood-free inference in higher dimensional parameter spaces, we introduce an extension of the popular Bayesian optimisation based approach to approximate discrepancy functions in a probabilistic manner which lends itself to an efficient exploration of the parameter space. Our approach achieves computational scalability for higher dimensional parameter spaces by using separate acquisition functions, discrepancies, and associated summary statistics for distinct subsets of the parameters. The efficient additive acquisition structure is combined with exponentiated loss-likelihood to provide a misspecification-robust characterisation of posterior distributions for subsets of model parameters. The method successfully performs computationally efficient inference in a moderately sized parameter space and compares favourably to existing modularised ABC methods. We further illustrate the potential of this approach by fitting a bacterial transmission dynamics model to a real data set, which provides biologically coherent results on strain competition in a 30-dimensional parameter space.

基于仿真器的统计模型的无似然推断已经迅速从婴儿期发展成为实践者的有用工具。然而,对于基于近似贝叶斯计算(ABC)的推理来说,具有多个参数的模型通常仍然是一个挑战。为了提高在高维参数空间中执行无似然推理的可能性,我们引入了流行的基于贝叶斯优化的方法的扩展,以概率方式近似差异函数,这有助于对参数空间进行有效的探索。我们的方法通过对参数的不同子集使用单独的获取函数、差异和相关的汇总统计来实现高维参数空间的计算可扩展性。有效的附加获取结构与指数损失似然相结合,为模型参数子集的后验分布提供了错误规范的鲁棒性表征。该方法成功地在中等大小的参数空间中执行计算效率推断,与现有的模块化ABC方法相比具有优势。我们通过将细菌传播动力学模型拟合到真实数据集进一步说明了这种方法的潜力,该数据集提供了30维参数空间中菌株竞争的生物学一致结果。
{"title":"Misspecification-robust likelihood-free inference in high dimensions.","authors":"Owen Thomas, Raquel Sá-Leão, Hermínia de Lencastre, Samuel Kaski, Jukka Corander, Henri Pesonen","doi":"10.1007/s00180-025-01607-4","DOIUrl":"10.1007/s00180-025-01607-4","url":null,"abstract":"<p><p>Likelihood-free inference for simulator-based statistical models has developed rapidly from its infancy to a useful tool for practitioners. However, models with more than a handful of parameters still generally remain a challenge for the Approximate Bayesian Computation (ABC) based inference. To advance the possibilities for performing likelihood-free inference in higher dimensional parameter spaces, we introduce an extension of the popular Bayesian optimisation based approach to approximate discrepancy functions in a probabilistic manner which lends itself to an efficient exploration of the parameter space. Our approach achieves computational scalability for higher dimensional parameter spaces by using separate acquisition functions, discrepancies, and associated summary statistics for distinct subsets of the parameters. The efficient additive acquisition structure is combined with exponentiated loss-likelihood to provide a misspecification-robust characterisation of posterior distributions for subsets of model parameters. The method successfully performs computationally efficient inference in a moderately sized parameter space and compares favourably to existing modularised ABC methods. We further illustrate the potential of this approach by fitting a bacterial transmission dynamics model to a real data set, which provides biologically coherent results on strain competition in a 30-dimensional parameter space.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"40 8","pages":"4399-4439"},"PeriodicalIF":1.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12552272/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayes estimation of ratio of scale-like parameters for inverse Gaussian distributions and applications to classification 贝叶斯估计反高斯分布的比例类参数比率及其在分类中的应用
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-19 DOI: 10.1007/s00180-024-01554-6
Ankur Chakraborty, Nabakumar Jana

We consider two inverse Gaussian populations with a common mean but different scale-like parameters, where all parameters are unknown. We construct noninformative priors for the ratio of the scale-like parameters to derive matching priors of different orders. Reference priors are proposed for different groups of parameters. The Bayes estimators of the common mean and ratio of the scale-like parameters are also derived. We propose confidence intervals of the conditional error rate in classifying an observation into inverse Gaussian distributions. A generalized variable-based confidence interval and the highest posterior density credible intervals for the error rate are computed. We estimate parameters of the mixture of these inverse Gaussian distributions and obtain estimates of the expected probability of correct classification. An intensive simulation study has been carried out to compare the estimators and expected probability of correct classification. Real data-based examples are given to show the practicality and effectiveness of the estimators.

我们考虑两个具有共同均值但不同类比参数的反高斯群体,其中所有参数都是未知的。我们为类标度参数的比率构建了非信息前验,从而推导出不同阶次的匹配前验。我们还为不同的参数组提出了参考先验。我们还推导出了类比例参数的共同均值和比率的贝叶斯估计值。我们提出了将观测分类为反高斯分布的条件误差率置信区间。我们计算了基于变量的广义置信区间和误差率的最高后验密度可信区间。我们估计了这些逆高斯分布的混合物参数,并获得了正确分类的预期概率估计值。为了比较估计值和正确分类的预期概率,我们进行了深入的模拟研究。我们还给出了基于真实数据的示例,以展示估计器的实用性和有效性。
{"title":"Bayes estimation of ratio of scale-like parameters for inverse Gaussian distributions and applications to classification","authors":"Ankur Chakraborty, Nabakumar Jana","doi":"10.1007/s00180-024-01554-6","DOIUrl":"https://doi.org/10.1007/s00180-024-01554-6","url":null,"abstract":"<p>We consider two inverse Gaussian populations with a common mean but different scale-like parameters, where all parameters are unknown. We construct noninformative priors for the ratio of the scale-like parameters to derive matching priors of different orders. Reference priors are proposed for different groups of parameters. The Bayes estimators of the common mean and ratio of the scale-like parameters are also derived. We propose confidence intervals of the conditional error rate in classifying an observation into inverse Gaussian distributions. A generalized variable-based confidence interval and the highest posterior density credible intervals for the error rate are computed. We estimate parameters of the mixture of these inverse Gaussian distributions and obtain estimates of the expected probability of correct classification. An intensive simulation study has been carried out to compare the estimators and expected probability of correct classification. Real data-based examples are given to show the practicality and effectiveness of the estimators.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"50 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multivariate approaches to investigate the home and away behavior of football teams playing football matches 研究足球队主客场比赛行为的多元方法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-17 DOI: 10.1007/s00180-024-01553-7
Antonello D’Ambra, Pietro Amenta, Antonio Lucadamo

Compared to other European competitions, participation in the Uefa Champions League is a real “bargain” for football clubs due to the hefty bonuses awarded based on performance during the group qualification phase. To perform successfully in football depends on several multidimensional factors, and analyzing the main ones remains challenging. In the performance study, little attention has been paid to teams’ behavior when playing at home and away. Our study combines statistical techniques to develop a procedure to examine teams’ performance. Several considerations make the 2022–2023 Serie A league season particularly interesting to analyze with our approach. Except for Napoli, all the teams showed different home-and-away behaviors concerning the results obtained at the season’s end. Ball possession and corners have positively influenced scored points in both home and away games with a different impact. The precision indicator was not an essential variable. The procedure highlighted the negative roles played by offside, as well as yellow and red cards.

与其他欧洲赛事相比,参加欧洲冠军联赛对足球俱乐部来说是真正的 "实惠",因为根据小组资格赛阶段的表现可获得高额奖金。要想在足球比赛中取得好成绩,取决于多个多维因素,而分析其中的主要因素仍具有挑战性。在成绩研究中,人们很少关注球队在主客场比赛中的表现。我们的研究结合了统计技术,制定了一套考察球队表现的程序。有几个因素使得 2022-2023 赛季的意甲联赛特别值得用我们的方法进行分析。除那不勒斯外,所有球队在赛季结束时的主客场表现都不尽相同。在主客场比赛中,控球率和角球对得分都有积极影响,但影响程度不同。精确度指标并非重要变量。该程序强调了越位以及黄牌和红牌的负面作用。
{"title":"Multivariate approaches to investigate the home and away behavior of football teams playing football matches","authors":"Antonello D’Ambra, Pietro Amenta, Antonio Lucadamo","doi":"10.1007/s00180-024-01553-7","DOIUrl":"https://doi.org/10.1007/s00180-024-01553-7","url":null,"abstract":"<p>Compared to other European competitions, participation in the Uefa Champions League is a real “bargain” for football clubs due to the hefty bonuses awarded based on performance during the group qualification phase. To perform successfully in football depends on several multidimensional factors, and analyzing the main ones remains challenging. In the performance study, little attention has been paid to teams’ behavior when playing at home and away. Our study combines statistical techniques to develop a procedure to examine teams’ performance. Several considerations make the 2022–2023 Serie A league season particularly interesting to analyze with our approach. Except for Napoli, all the teams showed different home-and-away behaviors concerning the results obtained at the season’s end. Ball possession and corners have positively influenced scored points in both home and away games with a different impact. The precision indicator was not an essential variable. The procedure highlighted the negative roles played by offside, as well as yellow and red cards.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"2 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kendall correlations and radar charts to include goals for and goals against in soccer rankings 肯德尔相关性和雷达图,在足球排名中纳入进球数和失球数
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-17 DOI: 10.1007/s00180-024-01542-w
Roy Cerqueti, Raffaele Mattera, Valerio Ficcadenti

This paper deals with the challenging themes of the way sporting teams and athletes are ranked in sports competitions. Starting from the paradigmatic case of soccer, we advance a new method for ranking teams in the official national championships through computational statistics methods based on Kendall correlations and radar charts. In detail, we consider the goals for and against the teams in the individual matches as a further source of score assignment beyond the usual win-tie-lose trichotomy. Our approach overcomes some biases in the scoring rules that are currently employed. The methodological proposal is tested over the relevant case of the Italian “Serie A” championships played during 1930–2023.

本文探讨了体育比赛中运动队和运动员排名方式这一具有挑战性的主题。我们从足球这一典型案例出发,通过基于肯德尔相关性和雷达图的计算统计方法,提出了一种在官方全国锦标赛中对球队进行排名的新方法。具体而言,我们考虑了单场比赛中球队的进球数和失球数,将其作为除通常的胜平负三分法之外的另一种分数分配来源。我们的方法克服了目前采用的评分规则中的一些偏差。我们在 1930-2023 年期间举行的意大利甲级联赛冠军赛的相关案例中对这一方法建议进行了测试。
{"title":"Kendall correlations and radar charts to include goals for and goals against in soccer rankings","authors":"Roy Cerqueti, Raffaele Mattera, Valerio Ficcadenti","doi":"10.1007/s00180-024-01542-w","DOIUrl":"https://doi.org/10.1007/s00180-024-01542-w","url":null,"abstract":"<p>This paper deals with the challenging themes of the way sporting teams and athletes are ranked in sports competitions. Starting from the paradigmatic case of soccer, we advance a new method for ranking teams in the official national championships through computational statistics methods based on Kendall correlations and radar charts. In detail, we consider the goals for and against the teams in the individual matches as a further source of score assignment beyond the usual win-tie-lose trichotomy. Our approach overcomes some biases in the scoring rules that are currently employed. The methodological proposal is tested over the relevant case of the Italian “Serie A” championships played during 1930–2023.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"35 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian adaptive lasso quantile regression with non-ignorable missing responses 具有不可忽略的缺失响应的贝叶斯自适应套索量化回归
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-16 DOI: 10.1007/s00180-024-01546-6
Ranran Chen, Mai Dao, Keying Ye, Min Wang

In this paper, we develop a fully Bayesian adaptive lasso quantile regression model to analyze data with non-ignorable missing responses, which frequently occur in various fields of study. Specifically, we employ a logistic regression model to deal with missing data of non-ignorable mechanism. By using the asymmetric Laplace working likelihood for the data and specifying Laplace priors for the regression coefficients, our proposed method extends the Bayesian lasso framework by imposing specific penalization parameters on each regression coefficient, enhancing our estimation and variable selection capability. Furthermore, we embrace the normal-exponential mixture representation of the asymmetric Laplace distribution and the Student-t approximation of the logistic regression model to develop a simple and efficient Gibbs sampling algorithm for generating posterior samples and making statistical inferences. The finite-sample performance of the proposed algorithm is investigated through various simulation studies and a real-data example.

在本文中,我们开发了一种全贝叶斯自适应套索量子回归模型,用于分析在各个研究领域经常出现的不可忽略的缺失响应数据。具体来说,我们采用逻辑回归模型来处理不可忽略机制的缺失数据。通过对数据使用非对称拉普拉斯工作似然,并为回归系数指定拉普拉斯先验,我们提出的方法扩展了贝叶斯套索框架,对每个回归系数施加了特定的惩罚参数,从而增强了我们的估计和变量选择能力。此外,我们还采用了非对称拉普拉斯分布的正态-指数混合表示法和逻辑回归模型的 Student-t 近似方法,开发了一种简单高效的吉布斯抽样算法,用于生成后验样本并进行统计推断。通过各种模拟研究和一个真实数据示例,研究了所提算法的有限样本性能。
{"title":"Bayesian adaptive lasso quantile regression with non-ignorable missing responses","authors":"Ranran Chen, Mai Dao, Keying Ye, Min Wang","doi":"10.1007/s00180-024-01546-6","DOIUrl":"https://doi.org/10.1007/s00180-024-01546-6","url":null,"abstract":"<p>In this paper, we develop a fully Bayesian adaptive lasso quantile regression model to analyze data with non-ignorable missing responses, which frequently occur in various fields of study. Specifically, we employ a logistic regression model to deal with missing data of non-ignorable mechanism. By using the asymmetric Laplace working likelihood for the data and specifying Laplace priors for the regression coefficients, our proposed method extends the Bayesian lasso framework by imposing specific penalization parameters on each regression coefficient, enhancing our estimation and variable selection capability. Furthermore, we embrace the normal-exponential mixture representation of the asymmetric Laplace distribution and the Student-<i>t</i> approximation of the logistic regression model to develop a simple and efficient Gibbs sampling algorithm for generating posterior samples and making statistical inferences. The finite-sample performance of the proposed algorithm is investigated through various simulation studies and a real-data example.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"94 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical visualisation of tidy and geospatial data in R via kernel smoothing methods in the eks package 通过eks软件包中的核平滑方法在R语言中实现整洁数据和地理空间数据的统计可视化
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-14 DOI: 10.1007/s00180-024-01543-9
Tarn Duong

Kernel smoothers are essential tools for data analysis due to their ability to convey complex statistical information with concise graphical visualisations. Their inclusion in the base distribution and in the many user-contributed add-on packages of the R statistical analysis environment caters well to many practitioners. Though there remain some important gaps for specialised data, most notably for tidy and geospatial data. The proposed eks package fills in these gaps. In addition to kernel density estimation, this package also caters for more complex data analysis situations, such as density derivative estimation, density-based classification (supervised learning) and mean shift clustering (unsupervised learning). We illustrate with experimental data how to obtain and to interpret the statistical visualisations for these kernel smoothing methods.

核平滑器能以简洁的图形直观地表达复杂的统计信息,是数据分析的重要工具。R 统计分析环境的基本发行版和许多用户贡献的附加软件包中都包含了这些工具,很好地满足了许多从业人员的需求。不过,对于专业数据,尤其是整洁数据和地理空间数据,仍然存在一些重要的空白。拟议的 eks 软件包填补了这些空白。除核密度估计外,该软件包还可用于更复杂的数据分析情况,如密度导数估计、基于密度的分类(监督学习)和均值移动聚类(无监督学习)。我们将用实验数据说明如何获得和解释这些核平滑方法的统计可视化效果。
{"title":"Statistical visualisation of tidy and geospatial data in R via kernel smoothing methods in the eks package","authors":"Tarn Duong","doi":"10.1007/s00180-024-01543-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01543-9","url":null,"abstract":"<p>Kernel smoothers are essential tools for data analysis due to their ability to convey complex statistical information with concise graphical visualisations. Their inclusion in the base distribution and in the many user-contributed add-on packages of the <span>R</span> statistical analysis environment caters well to many practitioners. Though there remain some important gaps for specialised data, most notably for tidy and geospatial data. The proposed <span>eks</span> package fills in these gaps. In addition to kernel density estimation, this package also caters for more complex data analysis situations, such as density derivative estimation, density-based classification (supervised learning) and mean shift clustering (unsupervised learning). We illustrate with experimental data how to obtain and to interpret the statistical visualisations for these kernel smoothing methods.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"119 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using the Krylov subspace formulation to improve regularisation and interpretation in partial least squares regression 利用克雷洛夫子空间公式改进偏最小二乘回归中的正则化和解释能力
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-12 DOI: 10.1007/s00180-024-01545-7
Tommy Löfstedt

Partial least squares regression (PLS-R) has been an important regression method in the life sciences and many other fields for decades. However, PLS-R is typically solved using an opaque algorithmic approach, rather than through an optimisation formulation and procedure. There is a clear optimisation formulation of the PLS-R problem based on a Krylov subspace formulation, but it is only rarely considered. The popularity of PLS-R is attributed to the ability to interpret the data through the model components, but the model components are not available when solving the PLS-R problem using the Krylov subspace formulation. We therefore highlight a simple reformulation of the PLS-R problem using the Krylov subspace formulation as a promising modelling framework for PLS-R, and illustrate one of the main benefits of this reformulation—that it allows arbitrary penalties of the regression coefficients in the PLS-R model. Further, we propose an approach to estimate the PLS-R model components for the solution found through the Krylov subspace formulation, that are those we would have obtained had we been able to use the common algorithms for estimating the PLS-R model. We illustrate the utility of the proposed method on simulated and real data.

几十年来,偏最小二乘回归(PLS-R)一直是生命科学和许多其他领域的重要回归方法。然而,PLS-R 通常采用不透明的算法方法,而不是通过优化公式和程序来解决。基于 Krylov 子空间公式的 PLS-R 问题有一个明确的优化公式,但很少被考虑。PLS-R 的流行归因于通过模型成分解释数据的能力,但在使用 Krylov 子空间公式求解 PLS-R 问题时,模型成分是不可用的。因此,我们强调使用 Krylov 子空间公式对 PLS-R 问题进行简单重拟,将其作为 PLS-R 的一个有前途的建模框架,并说明了这种重拟的一个主要优点--它允许对 PLS-R 模型中的回归系数进行任意惩罚。此外,我们还提出了一种方法,用于估计通过克雷洛夫子空间公式找到的解决方案的 PLS-R 模型成分,也就是我们在使用普通算法估计 PLS-R 模型时会得到的那些成分。我们在模拟数据和真实数据上说明了所提方法的实用性。
{"title":"Using the Krylov subspace formulation to improve regularisation and interpretation in partial least squares regression","authors":"Tommy Löfstedt","doi":"10.1007/s00180-024-01545-7","DOIUrl":"https://doi.org/10.1007/s00180-024-01545-7","url":null,"abstract":"<p>Partial least squares regression (PLS-R) has been an important regression method in the life sciences and many other fields for decades. However, PLS-R is typically solved using an opaque algorithmic approach, rather than through an optimisation formulation and procedure. There is a clear optimisation formulation of the PLS-R problem based on a Krylov subspace formulation, but it is only rarely considered. The popularity of PLS-R is attributed to the ability to interpret the data through the model components, but the model components are not available when solving the PLS-R problem using the Krylov subspace formulation. We therefore highlight a simple reformulation of the PLS-R problem using the Krylov subspace formulation as a promising modelling framework for PLS-R, and illustrate one of the main benefits of this reformulation—that it allows arbitrary penalties of the regression coefficients in the PLS-R model. Further, we propose an approach to estimate the PLS-R model components for the solution found through the Krylov subspace formulation, that are those we would have obtained had we been able to use the common algorithms for estimating the PLS-R model. We illustrate the utility of the proposed method on simulated and real data.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"25 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1