首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Simultaneously detecting spatiotemporal changes with penalized Poisson regression models 用惩罚泊松回归模型同时检测时空变化
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-07 DOI: 10.1016/j.csda.2025.108240
Zerui Zhang , Xin Wang , Xin Zhang , Jing Zhang
In the realm of large-scale spatiotemporal data, abrupt changes are commonly occurring across both spatial and temporal domains. To address the concurrent challenges of detecting change points and identifying spatial clusters within spatiotemporal count data, an innovative method is introduced based on the Poisson regression model. The proposed method employs doubly fused penalization to unveil the underlying spatiotemporal change patterns. To efficiently estimate the model, an iterative shrinkage and threshold based algorithm is developed to minimize the doubly penalized likelihood function. The reliability and accuracy is confirmed by the statistical consistency properties. Furthermore, extensive numerical experiments are conducted to validate the theoretical findings, thereby highlighting the superior performance of the proposed method when compared to existing competitive approaches.
在大尺度时空数据领域,突变通常发生在空间和时间域。为了解决在时空计数数据中检测变化点和识别空间簇的同时挑战,提出了一种基于泊松回归模型的创新方法。该方法采用双重融合惩罚来揭示潜在的时空变化模式。为了有效地估计模型,提出了一种基于迭代收缩和阈值的算法来最小化双重惩罚的似然函数。统计一致性证明了该方法的可靠性和准确性。此外,还进行了大量的数值实验来验证理论发现,从而突出了与现有竞争方法相比所提出方法的优越性能。
{"title":"Simultaneously detecting spatiotemporal changes with penalized Poisson regression models","authors":"Zerui Zhang ,&nbsp;Xin Wang ,&nbsp;Xin Zhang ,&nbsp;Jing Zhang","doi":"10.1016/j.csda.2025.108240","DOIUrl":"10.1016/j.csda.2025.108240","url":null,"abstract":"<div><div>In the realm of large-scale spatiotemporal data, abrupt changes are commonly occurring across both spatial and temporal domains. To address the concurrent challenges of detecting change points and identifying spatial clusters within spatiotemporal count data, an innovative method is introduced based on the Poisson regression model. The proposed method employs doubly fused penalization to unveil the underlying spatiotemporal change patterns. To efficiently estimate the model, an iterative shrinkage and threshold based algorithm is developed to minimize the doubly penalized likelihood function. The reliability and accuracy is confirmed by the statistical consistency properties. Furthermore, extensive numerical experiments are conducted to validate the theoretical findings, thereby highlighting the superior performance of the proposed method when compared to existing competitive approaches.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108240"},"PeriodicalIF":1.5,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New tests for the identity and sphericity of high-dimensional covariance matrices via U-statistics 用u统计量检验高维协方差矩阵的恒等性和球性
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-05 DOI: 10.1016/j.csda.2025.108242
Xiaoge Xiong
Two novel test procedures are proposed for the identity and sphericity of covariance matrices in high-dimensional asymptotic frameworks, both constructed via U-statistics. The limiting distributions of these tests are established under null and local alternative hypotheses. Monte Carlo simulation results demonstrate their superiority over several competing methods across various scenarios, with the proposed tests achieving full power against both dense and sparse alternatives. The effectiveness of the proposed tests is further validated through an application to a colon dataset.
提出了两种新的检验方法来检验高维渐近框架中协方差矩阵的恒等性和球性,这两种检验方法都是用u统计量构造的。这些检验的极限分布是在零假设和局部备用假设下建立的。蒙特卡罗仿真结果表明,在各种情况下,该方法优于几种竞争方法,所提出的测试在密集和稀疏替代方案下都能达到全功率。通过对冒号数据集的应用程序进一步验证了所建议测试的有效性。
{"title":"New tests for the identity and sphericity of high-dimensional covariance matrices via U-statistics","authors":"Xiaoge Xiong","doi":"10.1016/j.csda.2025.108242","DOIUrl":"10.1016/j.csda.2025.108242","url":null,"abstract":"<div><div>Two novel test procedures are proposed for the identity and sphericity of covariance matrices in high-dimensional asymptotic frameworks, both constructed via U-statistics. The limiting distributions of these tests are established under null and local alternative hypotheses. Monte Carlo simulation results demonstrate their superiority over several competing methods across various scenarios, with the proposed tests achieving full power against both dense and sparse alternatives. The effectiveness of the proposed tests is further validated through an application to a colon dataset.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108242"},"PeriodicalIF":1.5,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144570845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional and banded integer-valued autoregressive processes 高维带整数值自回归过程
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-04 DOI: 10.1016/j.csda.2025.108243
Nuo Xu, Kai Yang
The modeling of high-dimensional time series has always been an appealing and challenging problem. The main difficulties of modeling high-dimensional time series lie in the curse of dimensionality and complex cross dependence between adjacent components. To solve these problems for high-dimensional time series of counts, a class of high-dimensional and banded integer-valued autoregressive processes without assuming the innovation's distribution is proposed. A banded thinning structure is constructed to diminish the parameters' dimension. The componentwise conditional least squares and weighted conditional least squares methods are developed to estimate the banded autoregressive coefficient matrices. The bandwidth parameter is identified via a marginal Bayesian information criterion method. Some numerical results are provided to show the good performance of the estimators. Finally, the superiority of the proposed model is shown by an application to an air quality data set of different cities.
高维时间序列的建模一直是一个具有吸引力和挑战性的问题。高维时间序列建模的主要困难在于维度的诅咒和相邻分量之间复杂的交叉依赖。为了解决高维计数时间序列的这些问题,提出了一类不假设创新分布的高维带状整值自回归过程。采用带状减薄结构减小参数尺寸。提出了组合条件最小二乘法和加权条件最小二乘法来估计带状自回归系数矩阵。利用边际贝叶斯信息准则识别带宽参数。数值结果表明了该估计器的良好性能。最后,通过对不同城市空气质量数据集的应用,证明了该模型的优越性。
{"title":"High-dimensional and banded integer-valued autoregressive processes","authors":"Nuo Xu,&nbsp;Kai Yang","doi":"10.1016/j.csda.2025.108243","DOIUrl":"10.1016/j.csda.2025.108243","url":null,"abstract":"<div><div>The modeling of high-dimensional time series has always been an appealing and challenging problem. The main difficulties of modeling high-dimensional time series lie in the curse of dimensionality and complex cross dependence between adjacent components. To solve these problems for high-dimensional time series of counts, a class of high-dimensional and banded integer-valued autoregressive processes without assuming the innovation's distribution is proposed. A banded thinning structure is constructed to diminish the parameters' dimension. The componentwise conditional least squares and weighted conditional least squares methods are developed to estimate the banded autoregressive coefficient matrices. The bandwidth parameter is identified via a marginal Bayesian information criterion method. Some numerical results are provided to show the good performance of the estimators. Finally, the superiority of the proposed model is shown by an application to an air quality data set of different cities.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108243"},"PeriodicalIF":1.5,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional inference for ultrahigh-dimensional additive hazards model 超高维加性危险模型的条件推理
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-04 DOI: 10.1016/j.csda.2025.108244
Meiling Hao , Ruiyu Yang , Fangfang Bai , Liuquan Sun
In the realm of high-throughput genomic data, modeling with ultrahigh-dimensional covariates and censored survival outcomes is of great importance. We conduct conditional inference for the ultrahigh-dimensional additive hazards model, allowing both the covariates of interest and nuisance covariates to be ultrahigh-dimensional. The presence of right censorship with survival outcomes adds an extra layer of complexity to the original data structure, posing significant challenges for the ultrahigh-dimensional additive hazards model. To address this, we introduce an innovative test statistic based on the quadratic norm of the score function. Moreover, when there is a high correlation between the covariates of interest and nuisance covariates, we propose a decorrelated score function-based test statistic to enhance statistical power. Additionally, we establish the limiting distributions of the test statistics under both the null and local alternative hypotheses, further enhancing the computational appeal of our approach. The proposed statistics are thoroughly evaluated through extensive simulation studies and applied to two real data examples.
在高通量基因组数据领域,使用超高维协变量和截尾生存结果进行建模非常重要。我们对超高维的加性危害模型进行条件推理,允许感兴趣的协变量和讨厌的协变量都是超高维的。带有生存结果的正确审查的存在给原始数据结构增加了额外的复杂性,给超高维加性风险模型带来了重大挑战。为了解决这个问题,我们引入了一个基于分数函数的二次范数的创新检验统计量。此外,当感兴趣的协变量和讨厌的协变量之间存在高度相关时,我们提出了一种基于去相关分数函数的检验统计量来提高统计能力。此外,我们在零假设和局部可选假设下建立了检验统计量的极限分布,进一步增强了我们方法的计算吸引力。通过广泛的模拟研究和应用于两个真实数据实例,对所提出的统计数据进行了彻底的评估。
{"title":"Conditional inference for ultrahigh-dimensional additive hazards model","authors":"Meiling Hao ,&nbsp;Ruiyu Yang ,&nbsp;Fangfang Bai ,&nbsp;Liuquan Sun","doi":"10.1016/j.csda.2025.108244","DOIUrl":"10.1016/j.csda.2025.108244","url":null,"abstract":"<div><div>In the realm of high-throughput genomic data, modeling with ultrahigh-dimensional covariates and censored survival outcomes is of great importance. We conduct conditional inference for the ultrahigh-dimensional additive hazards model, allowing both the covariates of interest and nuisance covariates to be ultrahigh-dimensional. The presence of right censorship with survival outcomes adds an extra layer of complexity to the original data structure, posing significant challenges for the ultrahigh-dimensional additive hazards model. To address this, we introduce an innovative test statistic based on the quadratic norm of the score function. Moreover, when there is a high correlation between the covariates of interest and nuisance covariates, we propose a decorrelated score function-based test statistic to enhance statistical power. Additionally, we establish the limiting distributions of the test statistics under both the null and local alternative hypotheses, further enhancing the computational appeal of our approach. The proposed statistics are thoroughly evaluated through extensive simulation studies and applied to two real data examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108244"},"PeriodicalIF":1.5,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pure interaction effects unseen by Random Forests 随机森林看不到的纯粹互动效果
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-01 DOI: 10.1016/j.csda.2025.108237
Ricardo Blum , Munir Hiabu , Enno Mammen , Joseph T. Meyer
Random Forests are widely claimed to capture interactions well. However, some simple examples suggest that they perform poorly in the presence of certain pure interactions that the conventional CART criterion struggles to capture during tree construction. Motivated from this, it is argued that simple alternative partitioning schemes used in the tree growing procedure can enhance identification of these interactions. In a simulation study these variants are compared to conventional Random Forests and Extremely Randomized Trees. The results validate that the modifications considered enhance the model's fitting ability in scenarios where pure interactions play a crucial role. Finally, the methods are applied to real datasets.
人们普遍认为随机森林可以很好地捕捉到相互作用。然而,一些简单的例子表明,它们在存在某些纯交互的情况下表现不佳,而传统的CART标准在树构建期间难以捕获这些交互。基于此,有人认为,在树木生长过程中使用的简单替代划分方案可以增强这些相互作用的识别。在模拟研究中,将这些变量与传统的随机森林和极度随机树进行了比较。结果证实,在纯交互作用起关键作用的情况下,所考虑的修改增强了模型的拟合能力。最后,将该方法应用于实际数据集。
{"title":"Pure interaction effects unseen by Random Forests","authors":"Ricardo Blum ,&nbsp;Munir Hiabu ,&nbsp;Enno Mammen ,&nbsp;Joseph T. Meyer","doi":"10.1016/j.csda.2025.108237","DOIUrl":"10.1016/j.csda.2025.108237","url":null,"abstract":"<div><div>Random Forests are widely claimed to capture interactions well. However, some simple examples suggest that they perform poorly in the presence of certain pure interactions that the conventional CART criterion struggles to capture during tree construction. Motivated from this, it is argued that simple alternative partitioning schemes used in the tree growing procedure can enhance identification of these interactions. In a simulation study these variants are compared to conventional Random Forests and Extremely Randomized Trees. The results validate that the modifications considered enhance the model's fitting ability in scenarios where pure interactions play a crucial role. Finally, the methods are applied to real datasets.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108237"},"PeriodicalIF":1.5,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable selection for spatio-temporal conditionally Poisson point processes 时空条件泊松点过程的变量选择
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-27 DOI: 10.1016/j.csda.2025.108238
Achmad Choiruddin , Jonatan A. González , Jorge Mateu , Alwan Fadlurohman , Rasmus Waagepetersen
Spatio-temporal point pattern data are becoming prevalent in many scientific disciplines. We consider a sequence of spatial point processes where each point process is Poisson given the past. We model the conditional first-order intensity function of each point process as a parametric log-linear function of spatial, temporal, and spatio-temporal covariates that may depend on previous point patterns. Dealing with spatio-temporal covariates brings computational and methodological challenges compared to the purely spatial case. We extend regularisation methods for spatial point process variable selection to obtain parsimonious and interpretable models in the considered spatio-temporal case. Using our proposed methodology, we conduct two simulation studies and examine an application to criminal activity in the Kennedy district of Bogota. In the application, we consider a spatio-temporal point pattern data of crime locations and a number of spatial, temporal, and spatio-temporal covariates related to urban places, environmental factors, and further space-time factors. The intensity function of vehicle thefts is estimated, considering other crimes as covariate information. The proposed methodology offers a comprehensive approach for analysing spatio-temporal point pattern crime data, capturing complex relationships between covariates and crime occurrences over space and time.
时空点模式数据在许多科学学科中越来越流行。我们考虑一个空间点过程序列,其中每个点过程都是给定过去的泊松过程。我们将每个点过程的条件一阶强度函数建模为空间、时间和时空协变量的参数对数线性函数,这些协变量可能依赖于先前的点模式。与纯粹的空间情况相比,处理时空协变量带来了计算和方法上的挑战。我们扩展了空间点过程变量选择的正则化方法,以在考虑的时空情况下获得简洁和可解释的模型。使用我们提出的方法,我们进行了两次模拟研究,并检查了波哥大肯尼迪区犯罪活动的应用。在应用程序中,我们考虑了犯罪地点的时空点模式数据以及与城市地点、环境因素和进一步的时空因素相关的一些空间、时间和时空协变量。考虑其他犯罪作为协变量信息,估计了车辆盗窃的强度函数。所提出的方法提供了一种全面的方法来分析时空点模式犯罪数据,捕捉协变量和犯罪事件之间的复杂关系。
{"title":"Variable selection for spatio-temporal conditionally Poisson point processes","authors":"Achmad Choiruddin ,&nbsp;Jonatan A. González ,&nbsp;Jorge Mateu ,&nbsp;Alwan Fadlurohman ,&nbsp;Rasmus Waagepetersen","doi":"10.1016/j.csda.2025.108238","DOIUrl":"10.1016/j.csda.2025.108238","url":null,"abstract":"<div><div>Spatio-temporal point pattern data are becoming prevalent in many scientific disciplines. We consider a sequence of spatial point processes where each point process is Poisson given the past. We model the conditional first-order intensity function of each point process as a parametric log-linear function of spatial, temporal, and spatio-temporal covariates that may depend on previous point patterns. Dealing with spatio-temporal covariates brings computational and methodological challenges compared to the purely spatial case. We extend regularisation methods for spatial point process variable selection to obtain parsimonious and interpretable models in the considered spatio-temporal case. Using our proposed methodology, we conduct two simulation studies and examine an application to criminal activity in the Kennedy district of Bogota. In the application, we consider a spatio-temporal point pattern data of crime locations and a number of spatial, temporal, and spatio-temporal covariates related to urban places, environmental factors, and further space-time factors. The intensity function of vehicle thefts is estimated, considering other crimes as covariate information. The proposed methodology offers a comprehensive approach for analysing spatio-temporal point pattern crime data, capturing complex relationships between covariates and crime occurrences over space and time.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108238"},"PeriodicalIF":1.5,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144535764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A score-based threshold effect test in time series models 时间序列模型中基于分数的阈值效应检验
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-25 DOI: 10.1016/j.csda.2025.108236
Shufang Wei , Yaping Deng , Yaxing Yang
A score-based test statistic is developed to compare a linear ARMA model with its threshold extension. In particular, the focus is on testing the threshold effect in continuous threshold models with no jump at the threshold. Notably, while developed for continuous threshold models, the proposed test remains effective for discontinuous cases. The proposed test does not require fitting the model under the alternative hypothesis, making it computationally more efficient than the quasi-likelihood ratio test. The asymptotic distributions of the score-based test statistic are derived under both the null hypothesis and local alternatives. Simulations indicate that the proposed test has better size than the quasi-likelihood ratio test and demonstrates stronger power compared to the Lagrange Multiplier test. The asymptotic theory of the least square estimation for the continuous threshold ARMA model is further established. An application to the quarterly U.S. civilian unemployment rates data is given.
提出了一种基于分数的检验统计量来比较线性ARMA模型及其阈值扩展。重点研究了连续阈值模型在阈值处无跳跃的阈值效应。值得注意的是,虽然为连续阈值模型开发,但所提出的测试对于不连续的情况仍然有效。所提出的检验不需要在备择假设下拟合模型,使其在计算上比准似然比检验更有效。在零假设和局部替代条件下,导出了基于分数的检验统计量的渐近分布。仿真结果表明,该方法比拟似然比检验具有更好的规模,比拉格朗日乘数检验具有更强的有效性。进一步建立了连续阈值ARMA模型的最小二乘估计渐近理论。给出了美国季度平民失业率数据的应用程序。
{"title":"A score-based threshold effect test in time series models","authors":"Shufang Wei ,&nbsp;Yaping Deng ,&nbsp;Yaxing Yang","doi":"10.1016/j.csda.2025.108236","DOIUrl":"10.1016/j.csda.2025.108236","url":null,"abstract":"<div><div>A score-based test statistic is developed to compare a linear ARMA model with its threshold extension. In particular, the focus is on testing the threshold effect in continuous threshold models with no jump at the threshold. Notably, while developed for continuous threshold models, the proposed test remains effective for discontinuous cases. The proposed test does not require fitting the model under the alternative hypothesis, making it computationally more efficient than the quasi-likelihood ratio test. The asymptotic distributions of the score-based test statistic are derived under both the null hypothesis and local alternatives. Simulations indicate that the proposed test has better size than the quasi-likelihood ratio test and demonstrates stronger power compared to the Lagrange Multiplier test. The asymptotic theory of the least square estimation for the continuous threshold ARMA model is further established. An application to the quarterly U.S. civilian unemployment rates data is given.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108236"},"PeriodicalIF":1.5,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144491125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian selection approach for categorical responses via multinomial probit models 基于多项概率模型的分类响应贝叶斯选择方法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-20 DOI: 10.1016/j.csda.2025.108233
Chi-Hsiang Chu , Kuo-Jung Lee , Chien-Chin Hsu , Ray-Bing Chen
A multinomial probit model is proposed to examine a categorical response variable, with the main objective being the identification of the influential variables in the model. To this end, a Bayesian selection technique using two hierarchical indicators is employed. The first indicator denotes a variable's relevance to the categorical response, and the subsequent indicator relates to the variable's importance at a specific categorical level, which aids in assessing its impact at that level. The selection process relies on the posterior indicator samples generated through an MCMC algorithm. The efficacy of our Bayesian selection strategy is demonstrated through both simulation and an application to a real-world example.
提出了一个多项概率模型来检验分类响应变量,其主要目标是识别模型中的影响变量。为此,贝叶斯选择技术采用了两个层次指标。第一个指标表示变量与分类反应的相关性,随后的指标与变量在特定分类水平上的重要性有关,这有助于评估其在该水平上的影响。选择过程依赖于通过MCMC算法生成的后验指标样本。我们的贝叶斯选择策略的有效性通过模拟和应用到一个现实世界的例子来证明。
{"title":"Bayesian selection approach for categorical responses via multinomial probit models","authors":"Chi-Hsiang Chu ,&nbsp;Kuo-Jung Lee ,&nbsp;Chien-Chin Hsu ,&nbsp;Ray-Bing Chen","doi":"10.1016/j.csda.2025.108233","DOIUrl":"10.1016/j.csda.2025.108233","url":null,"abstract":"<div><div>A multinomial probit model is proposed to examine a categorical response variable, with the main objective being the identification of the influential variables in the model. To this end, a Bayesian selection technique using two hierarchical indicators is employed. The first indicator denotes a variable's relevance to the categorical response, and the subsequent indicator relates to the variable's importance at a specific categorical level, which aids in assessing its impact at that level. The selection process relies on the posterior indicator samples generated through an MCMC algorithm. The efficacy of our Bayesian selection strategy is demonstrated through both simulation and an application to a real-world example.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108233"},"PeriodicalIF":1.5,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing approximate modular Bayesian inference by emulating the conditional posterior 通过模拟条件后验增强近似模贝叶斯推理
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-20 DOI: 10.1016/j.csda.2025.108235
Grant Hutchings , Kellin N. Rumsey , Derek Bingham , Gabriel Huerta
In modular Bayesian analyses, complex models are composed of distinct modules, each representing different aspects of the data or prior information. In this context, fully Bayesian approaches can sometimes lead to undesirable feedback between modules, compromising the integrity of the inference. The “cut-distribution” prevents unwanted influence between modules by “cutting” feedback. The direct sampling (DS) algorithm is standard practice for approximating the cut-distribution, but it can be computationally intensive, especially when the number of imputations required is large. An enhanced method is proposed, the Emulating the Conditional Posterior (ECP) algorithm, which leverages emulation to increase the number of imputations. Through numerical experiment it is demonstrated that the ECP algorithm outperforms the traditional DS approach in terms of accuracy and computational efficiency, particularly when resources are constrained. It is also shown how the DS algorithm can be improved using ideas from design of experiments. Some practical recommendations are given for algorithm choice in modular Bayesian analyses.
在模块化贝叶斯分析中,复杂模型由不同的模块组成,每个模块代表数据或先验信息的不同方面。在这种情况下,完全贝叶斯方法有时会导致模块之间的不良反馈,从而损害推理的完整性。“切割分布”通过“切割”反馈防止模块之间不必要的影响。直接抽样(DS)算法是近似cut-distribution的标准做法,但它可能是计算密集的,特别是当所需的输入数量很大时。提出了一种增强的方法,即模拟条件后验(ECP)算法,该算法利用仿真来增加插值次数。通过数值实验证明,ECP算法在精度和计算效率方面优于传统的DS方法,特别是在资源受限的情况下。本文还展示了如何利用实验设计的思想来改进DS算法。给出了模块化贝叶斯分析中算法选择的一些实用建议。
{"title":"Enhancing approximate modular Bayesian inference by emulating the conditional posterior","authors":"Grant Hutchings ,&nbsp;Kellin N. Rumsey ,&nbsp;Derek Bingham ,&nbsp;Gabriel Huerta","doi":"10.1016/j.csda.2025.108235","DOIUrl":"10.1016/j.csda.2025.108235","url":null,"abstract":"<div><div>In modular Bayesian analyses, complex models are composed of distinct modules, each representing different aspects of the data or prior information. In this context, fully Bayesian approaches can sometimes lead to undesirable feedback between modules, compromising the integrity of the inference. The “cut-distribution” prevents unwanted influence between modules by “cutting” feedback. The direct sampling (DS) algorithm is standard practice for approximating the cut-distribution, but it can be computationally intensive, especially when the number of imputations required is large. An enhanced method is proposed, the Emulating the Conditional Posterior (ECP) algorithm, which leverages emulation to increase the number of imputations. Through numerical experiment it is demonstrated that the ECP algorithm outperforms the traditional DS approach in terms of accuracy and computational efficiency, particularly when resources are constrained. It is also shown how the DS algorithm can be improved using ideas from design of experiments. Some practical recommendations are given for algorithm choice in modular Bayesian analyses.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108235"},"PeriodicalIF":1.5,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144518106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based clustering for covariance matrices via penalized Wishart mixture models 基于模型的基于惩罚Wishart混合模型的协方差矩阵聚类
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-20 DOI: 10.1016/j.csda.2025.108232
Andrea Cappozzo , Alessandro Casa
Covariance matrices provide a valuable source of information about complex interactions and dependencies within the data. However, from a clustering perspective, this information has often been underutilized and overlooked. Indeed, commonly adopted distance-based approaches tend to rely primarily on mean levels to characterize and differentiate between groups. Recently, there have been promising efforts to cluster covariance matrices directly, thereby distinguishing groups solely based on the relationships between variables. From a model-based perspective, a probabilistic formalization has been provided by considering a mixture model with component densities following a Wishart distribution. Notwithstanding, this approach faces challenges when dealing with a large number of variables, as the number of parameters to be estimated increases quadratically. To address this issue, a sparse Wishart mixture model is proposed, which assumes that the component scale matrices possess a cluster-dependent degree of sparsity. Model estimation is performed by maximizing a penalized log-likelihood, enforcing a covariance graphical lasso penalty on the component scale matrices. This penalty not only reduces the number of non-zero parameters, mitigating the challenges of high-dimensional settings, but also enhances the interpretability of results by emphasizing the most relevant relationships among variables. The proposed methodology is tested on both simulated and real data, demonstrating its ability to unravel the complexities of neuroimaging data and effectively cluster subjects based on the relational patterns among distinct brain regions.
协方差矩阵提供了有关数据中复杂交互和依赖关系的有价值的信息源。然而,从集群的角度来看,这些信息往往没有得到充分利用和忽视。事实上,通常采用的基于距离的方法往往主要依赖于平均水平来表征和区分群体。最近,直接聚类协方差矩阵,从而仅根据变量之间的关系来区分组,已经有了很有希望的努力。从基于模型的角度来看,通过考虑组件密度遵循Wishart分布的混合模型,提供了概率形式化。然而,这种方法在处理大量变量时面临挑战,因为要估计的参数数量呈二次增长。为了解决这一问题,提出了一种稀疏的Wishart混合模型,该模型假设组件尺度矩阵具有簇依赖的稀疏度。模型估计是通过最大化惩罚对数似然来执行的,在分量尺度矩阵上强制执行协方差图形套索惩罚。这种惩罚不仅减少了非零参数的数量,减轻了高维设置的挑战,而且通过强调变量之间最相关的关系,增强了结果的可解释性。所提出的方法在模拟和真实数据上进行了测试,证明了它能够揭示神经成像数据的复杂性,并根据不同大脑区域之间的关系模式有效地聚类受试者。
{"title":"Model-based clustering for covariance matrices via penalized Wishart mixture models","authors":"Andrea Cappozzo ,&nbsp;Alessandro Casa","doi":"10.1016/j.csda.2025.108232","DOIUrl":"10.1016/j.csda.2025.108232","url":null,"abstract":"<div><div>Covariance matrices provide a valuable source of information about complex interactions and dependencies within the data. However, from a clustering perspective, this information has often been underutilized and overlooked. Indeed, commonly adopted distance-based approaches tend to rely primarily on mean levels to characterize and differentiate between groups. Recently, there have been promising efforts to cluster covariance matrices directly, thereby distinguishing groups solely based on the relationships between variables. From a model-based perspective, a probabilistic formalization has been provided by considering a mixture model with component densities following a Wishart distribution. Notwithstanding, this approach faces challenges when dealing with a large number of variables, as the number of parameters to be estimated increases quadratically. To address this issue, a sparse Wishart mixture model is proposed, which assumes that the component scale matrices possess a cluster-dependent degree of sparsity. Model estimation is performed by maximizing a penalized log-likelihood, enforcing a covariance graphical lasso penalty on the component scale matrices. This penalty not only reduces the number of non-zero parameters, mitigating the challenges of high-dimensional settings, but also enhances the interpretability of results by emphasizing the most relevant relationships among variables. The proposed methodology is tested on both simulated and real data, demonstrating its ability to unravel the complexities of neuroimaging data and effectively cluster subjects based on the relational patterns among distinct brain regions.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108232"},"PeriodicalIF":1.5,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1