首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Testing the equality of high dimensional distributions 测试高维分布的等式
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-07-09 DOI: 10.1016/j.csda.2025.108245
Reza Modarres
The Euclidean distance is not a suitable distance for high dimensional settings due to the distance concentration phenomenon. A novel statistic that is inspired by the interpoint distances, but avoids their computation, is proposed for comparing and visualizing high dimensional datasets. The new statistic is based on a high dimensional dissimilarity index that takes advantage of the concentration phenomenon. A simultaneous display of observations means and standard deviations that aids visualization, detection of suspect outliers, and enhances separability among the competing classes in the transformed space is discussed. The finite sample convergence of the dissimilarity indices is studied, nine statistics are compared under several distributions, and three applications are presented.
由于距离集中现象的存在,欧几里得距离不是一个适合高维环境的距离。提出了一种新的统计量,该统计量受点间距离的启发,但避免了点间距离的计算,用于比较和可视化高维数据集。新的统计是基于一个高维的不相似指数,利用了集中现象。同时显示观测均值和标准偏差,有助于可视化、可疑异常值的检测,并增强转换空间中竞争类之间的可分离性。研究了不相似度指标的有限样本收敛性,比较了几种分布下的9种统计量,并给出了3种应用。
{"title":"Testing the equality of high dimensional distributions","authors":"Reza Modarres","doi":"10.1016/j.csda.2025.108245","DOIUrl":"10.1016/j.csda.2025.108245","url":null,"abstract":"<div><div>The Euclidean distance is not a suitable distance for high dimensional settings due to the distance concentration phenomenon. A novel statistic that is inspired by the interpoint distances, but avoids their computation, is proposed for comparing and visualizing high dimensional datasets. The new statistic is based on a high dimensional dissimilarity index that takes advantage of the concentration phenomenon. A simultaneous display of observations means and standard deviations that aids visualization, detection of suspect outliers, and enhances separability among the competing classes in the transformed space is discussed. The finite sample convergence of the dissimilarity indices is studied, nine statistics are compared under several distributions, and three applications are presented.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108245"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneously detecting spatiotemporal changes with penalized Poisson regression models 用惩罚泊松回归模型同时检测时空变化
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-07-07 DOI: 10.1016/j.csda.2025.108240
Zerui Zhang , Xin Wang , Xin Zhang , Jing Zhang
In the realm of large-scale spatiotemporal data, abrupt changes are commonly occurring across both spatial and temporal domains. To address the concurrent challenges of detecting change points and identifying spatial clusters within spatiotemporal count data, an innovative method is introduced based on the Poisson regression model. The proposed method employs doubly fused penalization to unveil the underlying spatiotemporal change patterns. To efficiently estimate the model, an iterative shrinkage and threshold based algorithm is developed to minimize the doubly penalized likelihood function. The reliability and accuracy is confirmed by the statistical consistency properties. Furthermore, extensive numerical experiments are conducted to validate the theoretical findings, thereby highlighting the superior performance of the proposed method when compared to existing competitive approaches.
在大尺度时空数据领域,突变通常发生在空间和时间域。为了解决在时空计数数据中检测变化点和识别空间簇的同时挑战,提出了一种基于泊松回归模型的创新方法。该方法采用双重融合惩罚来揭示潜在的时空变化模式。为了有效地估计模型,提出了一种基于迭代收缩和阈值的算法来最小化双重惩罚的似然函数。统计一致性证明了该方法的可靠性和准确性。此外,还进行了大量的数值实验来验证理论发现,从而突出了与现有竞争方法相比所提出方法的优越性能。
{"title":"Simultaneously detecting spatiotemporal changes with penalized Poisson regression models","authors":"Zerui Zhang ,&nbsp;Xin Wang ,&nbsp;Xin Zhang ,&nbsp;Jing Zhang","doi":"10.1016/j.csda.2025.108240","DOIUrl":"10.1016/j.csda.2025.108240","url":null,"abstract":"<div><div>In the realm of large-scale spatiotemporal data, abrupt changes are commonly occurring across both spatial and temporal domains. To address the concurrent challenges of detecting change points and identifying spatial clusters within spatiotemporal count data, an innovative method is introduced based on the Poisson regression model. The proposed method employs doubly fused penalization to unveil the underlying spatiotemporal change patterns. To efficiently estimate the model, an iterative shrinkage and threshold based algorithm is developed to minimize the doubly penalized likelihood function. The reliability and accuracy is confirmed by the statistical consistency properties. Furthermore, extensive numerical experiments are conducted to validate the theoretical findings, thereby highlighting the superior performance of the proposed method when compared to existing competitive approaches.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108240"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pure interaction effects unseen by Random Forests 随机森林看不到的纯粹互动效果
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-07-01 DOI: 10.1016/j.csda.2025.108237
Ricardo Blum , Munir Hiabu , Enno Mammen , Joseph T. Meyer
Random Forests are widely claimed to capture interactions well. However, some simple examples suggest that they perform poorly in the presence of certain pure interactions that the conventional CART criterion struggles to capture during tree construction. Motivated from this, it is argued that simple alternative partitioning schemes used in the tree growing procedure can enhance identification of these interactions. In a simulation study these variants are compared to conventional Random Forests and Extremely Randomized Trees. The results validate that the modifications considered enhance the model's fitting ability in scenarios where pure interactions play a crucial role. Finally, the methods are applied to real datasets.
人们普遍认为随机森林可以很好地捕捉到相互作用。然而,一些简单的例子表明,它们在存在某些纯交互的情况下表现不佳,而传统的CART标准在树构建期间难以捕获这些交互。基于此,有人认为,在树木生长过程中使用的简单替代划分方案可以增强这些相互作用的识别。在模拟研究中,将这些变量与传统的随机森林和极度随机树进行了比较。结果证实,在纯交互作用起关键作用的情况下,所考虑的修改增强了模型的拟合能力。最后,将该方法应用于实际数据集。
{"title":"Pure interaction effects unseen by Random Forests","authors":"Ricardo Blum ,&nbsp;Munir Hiabu ,&nbsp;Enno Mammen ,&nbsp;Joseph T. Meyer","doi":"10.1016/j.csda.2025.108237","DOIUrl":"10.1016/j.csda.2025.108237","url":null,"abstract":"<div><div>Random Forests are widely claimed to capture interactions well. However, some simple examples suggest that they perform poorly in the presence of certain pure interactions that the conventional CART criterion struggles to capture during tree construction. Motivated from this, it is argued that simple alternative partitioning schemes used in the tree growing procedure can enhance identification of these interactions. In a simulation study these variants are compared to conventional Random Forests and Extremely Randomized Trees. The results validate that the modifications considered enhance the model's fitting ability in scenarios where pure interactions play a crucial role. Finally, the methods are applied to real datasets.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108237"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian forecasting of Italian seismicity using the spatiotemporal RETAS model 利用时空RETAS模型对意大利地震活动性的贝叶斯预测
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-06-02 DOI: 10.1016/j.csda.2025.108219
Tom Stindl , Zelong Bi , Clara Grazian
Spatiotemporal Renewal Epidemic Type Aftershock Sequence models are self-exciting point processes that model the occurrence time, epicenter, and magnitude of earthquakes in a geographical region. The arrival rate of earthquakes is formulated as the superposition of a main shock renewal process and homogeneous Poisson processes for the aftershocks, motivated by empirical laws in seismology. Existing methods for model fitting rely on maximizing the log-likelihood by either direct numerical optimization or Expectation Maximization algorithms, both of which can suffer from convergence issues and lack adequate quantification of parameter estimation uncertainty. To address these limitations, a Bayesian approach is employed, with posterior inference carried out using a data augmentation strategy within a Markov chain Monte Carlo framework. The branching structure is treated as a latent variable to improve sampling efficiency, and a purpose-built Hamiltonian Monte Carlo sampler is implemented to update the parameters within the Gibbs sampler. This methodology enables parameter uncertainty to be incorporated into forecasts of seismicity. Estimation and forecasting are demonstrated on simulated catalogs and an earthquake catalog from Italy. R code implementing the methods is provided in the Supplementary Materials.
时空更新流行型余震序列模型是一种模拟地理区域内地震发生时间、震中和震级的自激点过程。根据地震学的经验规律,将地震的到达率表示为主震更新过程和余震的均匀泊松过程的叠加。现有的模型拟合方法依赖于通过直接数值优化或期望最大化算法来最大化对数似然,这两种方法都存在收敛问题,并且缺乏对参数估计不确定性的充分量化。为了解决这些限制,采用贝叶斯方法,并在马尔可夫链蒙特卡罗框架内使用数据增强策略进行后验推理。将分支结构作为潜在变量处理以提高采样效率,并实现了专用的哈密顿蒙特卡罗采样器来更新吉布斯采样器内的参数。这种方法可以将参数的不确定性纳入地震活动性的预测中。以模拟地震目录和意大利地震目录为例进行了估计和预报。在补充资料中提供了实现这些方法的R代码。
{"title":"Bayesian forecasting of Italian seismicity using the spatiotemporal RETAS model","authors":"Tom Stindl ,&nbsp;Zelong Bi ,&nbsp;Clara Grazian","doi":"10.1016/j.csda.2025.108219","DOIUrl":"10.1016/j.csda.2025.108219","url":null,"abstract":"<div><div>Spatiotemporal Renewal Epidemic Type Aftershock Sequence models are self-exciting point processes that model the occurrence time, epicenter, and magnitude of earthquakes in a geographical region. The arrival rate of earthquakes is formulated as the superposition of a main shock renewal process and homogeneous Poisson processes for the aftershocks, motivated by empirical laws in seismology. Existing methods for model fitting rely on maximizing the log-likelihood by either direct numerical optimization or Expectation Maximization algorithms, both of which can suffer from convergence issues and lack adequate quantification of parameter estimation uncertainty. To address these limitations, a Bayesian approach is employed, with posterior inference carried out using a data augmentation strategy within a Markov chain Monte Carlo framework. The branching structure is treated as a latent variable to improve sampling efficiency, and a purpose-built Hamiltonian Monte Carlo sampler is implemented to update the parameters within the Gibbs sampler. This methodology enables parameter uncertainty to be incorporated into forecasts of seismicity. Estimation and forecasting are demonstrated on simulated catalogs and an earthquake catalog from Italy. <span>R</span> code implementing the methods is provided in the Supplementary Materials.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108219"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144261610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint estimation of precision matrices for long-memory time series 长记忆时间序列精度矩阵的联合估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-06-19 DOI: 10.1016/j.csda.2025.108234
Qihu Zhang , Jongik Chung , Cheolwoo Park
Methods are proposed for estimating multiple precision matrices for long-memory time series, with particular emphasis on the analysis of resting-state functional magnetic resonance imaging (fMRI) data obtained from multiple subjects. The objective is to estimate both individual brain networks and a common structure representative of a group. Several approaches employing weighted aggregation are introduced to simultaneously estimate individual and group-level precision matrices. Convergence rates of the estimators are examined under various norms and expectations, and their performance is evaluated under both sub-Gaussian and heavy-tailed distributions. The proposed methods are demonstrated through simulated data and real resting-state fMRI datasets.
提出了估计长记忆时间序列的多个精度矩阵的方法,重点分析了从多个受试者获得的静息状态功能磁共振成像(fMRI)数据。目的是估计个体大脑网络和代表群体的共同结构。介绍了几种采用加权聚合的方法来同时估计个体和群体级精度矩阵。在各种规范和期望下检验了估计器的收敛速度,并在亚高斯分布和重尾分布下评估了它们的性能。通过模拟数据和真实静息状态fMRI数据集验证了所提出的方法。
{"title":"Joint estimation of precision matrices for long-memory time series","authors":"Qihu Zhang ,&nbsp;Jongik Chung ,&nbsp;Cheolwoo Park","doi":"10.1016/j.csda.2025.108234","DOIUrl":"10.1016/j.csda.2025.108234","url":null,"abstract":"<div><div>Methods are proposed for estimating multiple precision matrices for long-memory time series, with particular emphasis on the analysis of resting-state functional magnetic resonance imaging (fMRI) data obtained from multiple subjects. The objective is to estimate both individual brain networks and a common structure representative of a group. Several approaches employing weighted aggregation are introduced to simultaneously estimate individual and group-level precision matrices. Convergence rates of the estimators are examined under various norms and expectations, and their performance is evaluated under both sub-Gaussian and heavy-tailed distributions. The proposed methods are demonstrated through simulated data and real resting-state fMRI datasets.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108234"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional inference for ultrahigh-dimensional additive hazards model 超高维加性危险模型的条件推理
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-07-04 DOI: 10.1016/j.csda.2025.108244
Meiling Hao , Ruiyu Yang , Fangfang Bai , Liuquan Sun
In the realm of high-throughput genomic data, modeling with ultrahigh-dimensional covariates and censored survival outcomes is of great importance. We conduct conditional inference for the ultrahigh-dimensional additive hazards model, allowing both the covariates of interest and nuisance covariates to be ultrahigh-dimensional. The presence of right censorship with survival outcomes adds an extra layer of complexity to the original data structure, posing significant challenges for the ultrahigh-dimensional additive hazards model. To address this, we introduce an innovative test statistic based on the quadratic norm of the score function. Moreover, when there is a high correlation between the covariates of interest and nuisance covariates, we propose a decorrelated score function-based test statistic to enhance statistical power. Additionally, we establish the limiting distributions of the test statistics under both the null and local alternative hypotheses, further enhancing the computational appeal of our approach. The proposed statistics are thoroughly evaluated through extensive simulation studies and applied to two real data examples.
在高通量基因组数据领域,使用超高维协变量和截尾生存结果进行建模非常重要。我们对超高维的加性危害模型进行条件推理,允许感兴趣的协变量和讨厌的协变量都是超高维的。带有生存结果的正确审查的存在给原始数据结构增加了额外的复杂性,给超高维加性风险模型带来了重大挑战。为了解决这个问题,我们引入了一个基于分数函数的二次范数的创新检验统计量。此外,当感兴趣的协变量和讨厌的协变量之间存在高度相关时,我们提出了一种基于去相关分数函数的检验统计量来提高统计能力。此外,我们在零假设和局部可选假设下建立了检验统计量的极限分布,进一步增强了我们方法的计算吸引力。通过广泛的模拟研究和应用于两个真实数据实例,对所提出的统计数据进行了彻底的评估。
{"title":"Conditional inference for ultrahigh-dimensional additive hazards model","authors":"Meiling Hao ,&nbsp;Ruiyu Yang ,&nbsp;Fangfang Bai ,&nbsp;Liuquan Sun","doi":"10.1016/j.csda.2025.108244","DOIUrl":"10.1016/j.csda.2025.108244","url":null,"abstract":"<div><div>In the realm of high-throughput genomic data, modeling with ultrahigh-dimensional covariates and censored survival outcomes is of great importance. We conduct conditional inference for the ultrahigh-dimensional additive hazards model, allowing both the covariates of interest and nuisance covariates to be ultrahigh-dimensional. The presence of right censorship with survival outcomes adds an extra layer of complexity to the original data structure, posing significant challenges for the ultrahigh-dimensional additive hazards model. To address this, we introduce an innovative test statistic based on the quadratic norm of the score function. Moreover, when there is a high correlation between the covariates of interest and nuisance covariates, we propose a decorrelated score function-based test statistic to enhance statistical power. Additionally, we establish the limiting distributions of the test statistics under both the null and local alternative hypotheses, further enhancing the computational appeal of our approach. The proposed statistics are thoroughly evaluated through extensive simulation studies and applied to two real data examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108244"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling continuous distributions in hybrid Bayesian networks using mixtures of polynomials with tails 用带尾多项式的混合建模混合贝叶斯网络中的连续分布
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-07-11 DOI: 10.1016/j.csda.2025.108246
J.C. Luengo , D. Ramos-López , R. Rumí
A new approach to modeling continuous distributions in hybrid Bayesian networks (BNs) is presented. It is based on Mixtures of Polynomials (MoPs) with tails, named as tMoPs. This proposal is a variation of the usual MoP model, now including tails and several other improvements in the learning process. The adequate modeling of tails in variable distributions is relevant theoretically and for many reals applications, in which rare phenomena may have a great impact. The proposed approach has been designed to exploit the flexibility of the tMoP model to fit different continuous data distributions. This is especially relevant in those distributions with zones of density close to zero, in which polynomial fitting may be difficult. In these situations, tMoPs allow a polynomial fit in parts with higher density and the use of tails in areas with lower density. This permits a better global fit, without loss of overall accuracy and yielding a relatively simple density function. Learning algorithms for tMoPs conditional probability distributions with up to two parents of any type are developed. These tMoPs may be integrated into hybrid Bayesian networks to represent conditional probability distributions, thus allowing to perform probabilistic reasoning, such as causal inference, sensitivity analysis, and other decision-making operations. The suitability of tMoPs is evaluated in several ways, using a large set of real datasets with data of different natures. The experiments include: the analysis of goodness-of-fit with several continuous and pseudo-continuous variables, the optimization of certain parameters and the effect of variable selection and graph structure when using tMoPs in BNs, and finally the evaluation of the predictive ability of hybrid BNs based on tMoPs in classification and regression. Results show the good behavior of our proposal, with the tMoP hybrid Bayesian networks being equally accurate or outperforming other techniques in most scenarios, in addition to providing a more informative and convenient probabilistic model.
提出了一种新的混合贝叶斯网络连续分布建模方法。它基于带有尾部的多项式混合(MoPs),称为tops。这个建议是通常的MoP模型的一个变体,现在在学习过程中包括了尾巴和其他几个改进。对变量分布中尾的适当建模在理论上和许多实际应用中都是相关的,在这些应用中,罕见的现象可能会产生很大的影响。所提出的方法旨在利用tMoP模型的灵活性来拟合不同的连续数据分布。这在那些密度区域接近于零的分布中尤其重要,在这些分布中多项式拟合可能很困难。在这些情况下,tops允许在密度较高的部分使用多项式拟合,并在密度较低的区域使用尾部。这允许更好的全局拟合,而不会损失整体精度,并产生相对简单的密度函数。开发了具有最多两个任意类型父节点的tops条件概率分布的学习算法。这些tops可以集成到混合贝叶斯网络中,以表示条件概率分布,从而允许执行概率推理,如因果推理、灵敏度分析和其他决策操作。通过使用大量具有不同性质数据的真实数据集,从几个方面评估了tops的适用性。实验包括:分析几个连续变量和伪连续变量的拟合优度,在bp网络中使用tMoPs对某些参数的优化以及变量选择和图结构的影响,最后评估基于tMoPs的混合bp网络在分类和回归方面的预测能力。结果显示了我们的建议的良好行为,除了提供更多信息和方便的概率模型外,tMoP混合贝叶斯网络在大多数情况下同样准确或优于其他技术。
{"title":"Modeling continuous distributions in hybrid Bayesian networks using mixtures of polynomials with tails","authors":"J.C. Luengo ,&nbsp;D. Ramos-López ,&nbsp;R. Rumí","doi":"10.1016/j.csda.2025.108246","DOIUrl":"10.1016/j.csda.2025.108246","url":null,"abstract":"<div><div>A new approach to modeling continuous distributions in hybrid Bayesian networks (BNs) is presented. It is based on Mixtures of Polynomials (MoPs) with tails, named as tMoPs. This proposal is a variation of the usual MoP model, now including tails and several other improvements in the learning process. The adequate modeling of tails in variable distributions is relevant theoretically and for many reals applications, in which rare phenomena may have a great impact. The proposed approach has been designed to exploit the flexibility of the tMoP model to fit different continuous data distributions. This is especially relevant in those distributions with zones of density close to zero, in which polynomial fitting may be difficult. In these situations, tMoPs allow a polynomial fit in parts with higher density and the use of tails in areas with lower density. This permits a better global fit, without loss of overall accuracy and yielding a relatively simple density function. Learning algorithms for tMoPs conditional probability distributions with up to two parents of any type are developed. These tMoPs may be integrated into hybrid Bayesian networks to represent conditional probability distributions, thus allowing to perform probabilistic reasoning, such as causal inference, sensitivity analysis, and other decision-making operations. The suitability of tMoPs is evaluated in several ways, using a large set of real datasets with data of different natures. The experiments include: the analysis of goodness-of-fit with several continuous and pseudo-continuous variables, the optimization of certain parameters and the effect of variable selection and graph structure when using tMoPs in BNs, and finally the evaluation of the predictive ability of hybrid BNs based on tMoPs in classification and regression. Results show the good behavior of our proposal, with the tMoP hybrid Bayesian networks being equally accurate or outperforming other techniques in most scenarios, in addition to providing a more informative and convenient probabilistic model.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108246"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel density estimation for compositional data with zeros via hypersphere mapping 基于超球映射的含零成分数据核密度估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-07-11 DOI: 10.1016/j.csda.2025.108249
Changwon Yoon , Hyunbin Choi , Jeongyoun Ahn
Compositional data—measurements of relative proportions among components—arise frequently in fields ranging from chemometrics to bioinformatics. While density estimation of such data provides crucial insights into their underlying patterns and enables comparative analyses across groups, existing nonparametric approaches are limited, particularly in handling zero components that commonly occur in real-world datasets. We propose a novel kernel density estimation (KDE) method for compositional data that naturally accommodates zero components by exploiting the geometric correspondence between simplices and hyperspheres. This connection to spherical KDE allows us to establish theoretical guarantees, including consistency of the estimator. Through extensive simulations and real data analyses, we demonstrate our method's advantages over existing approaches, particularly in scenarios involving zero components.
成分数据-测量成分之间的相对比例-经常出现在从化学计量学到生物信息学等领域。虽然这些数据的密度估计提供了对其潜在模式的重要见解,并使跨组的比较分析成为可能,但现有的非参数方法是有限的,特别是在处理现实世界数据集中常见的零组件时。我们提出了一种新的核密度估计(KDE)方法,该方法通过利用简单体和超球之间的几何对应关系,自然地容纳零分量。这种与球形KDE的连接允许我们建立理论保证,包括估计器的一致性。通过广泛的模拟和真实数据分析,我们证明了我们的方法比现有方法的优势,特别是在涉及零组件的情况下。
{"title":"Kernel density estimation for compositional data with zeros via hypersphere mapping","authors":"Changwon Yoon ,&nbsp;Hyunbin Choi ,&nbsp;Jeongyoun Ahn","doi":"10.1016/j.csda.2025.108249","DOIUrl":"10.1016/j.csda.2025.108249","url":null,"abstract":"<div><div>Compositional data—measurements of relative proportions among components—arise frequently in fields ranging from chemometrics to bioinformatics. While density estimation of such data provides crucial insights into their underlying patterns and enables comparative analyses across groups, existing nonparametric approaches are limited, particularly in handling zero components that commonly occur in real-world datasets. We propose a novel kernel density estimation (KDE) method for compositional data that naturally accommodates zero components by exploiting the geometric correspondence between simplices and hyperspheres. This connection to spherical KDE allows us to establish theoretical guarantees, including consistency of the estimator. Through extensive simulations and real data analyses, we demonstrate our method's advantages over existing approaches, particularly in scenarios involving zero components.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108249"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian selection approach for categorical responses via multinomial probit models 基于多项概率模型的分类响应贝叶斯选择方法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-06-20 DOI: 10.1016/j.csda.2025.108233
Chi-Hsiang Chu , Kuo-Jung Lee , Chien-Chin Hsu , Ray-Bing Chen
A multinomial probit model is proposed to examine a categorical response variable, with the main objective being the identification of the influential variables in the model. To this end, a Bayesian selection technique using two hierarchical indicators is employed. The first indicator denotes a variable's relevance to the categorical response, and the subsequent indicator relates to the variable's importance at a specific categorical level, which aids in assessing its impact at that level. The selection process relies on the posterior indicator samples generated through an MCMC algorithm. The efficacy of our Bayesian selection strategy is demonstrated through both simulation and an application to a real-world example.
提出了一个多项概率模型来检验分类响应变量,其主要目标是识别模型中的影响变量。为此,贝叶斯选择技术采用了两个层次指标。第一个指标表示变量与分类反应的相关性,随后的指标与变量在特定分类水平上的重要性有关,这有助于评估其在该水平上的影响。选择过程依赖于通过MCMC算法生成的后验指标样本。我们的贝叶斯选择策略的有效性通过模拟和应用到一个现实世界的例子来证明。
{"title":"Bayesian selection approach for categorical responses via multinomial probit models","authors":"Chi-Hsiang Chu ,&nbsp;Kuo-Jung Lee ,&nbsp;Chien-Chin Hsu ,&nbsp;Ray-Bing Chen","doi":"10.1016/j.csda.2025.108233","DOIUrl":"10.1016/j.csda.2025.108233","url":null,"abstract":"<div><div>A multinomial probit model is proposed to examine a categorical response variable, with the main objective being the identification of the influential variables in the model. To this end, a Bayesian selection technique using two hierarchical indicators is employed. The first indicator denotes a variable's relevance to the categorical response, and the subsequent indicator relates to the variable's importance at a specific categorical level, which aids in assessing its impact at that level. The selection process relies on the posterior indicator samples generated through an MCMC algorithm. The efficacy of our Bayesian selection strategy is demonstrated through both simulation and an application to a real-world example.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108233"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference on a stochastic SIR model including growth curves 包含生长曲线的随机SIR模型的推论
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-06-16 DOI: 10.1016/j.csda.2025.108231
Giuseppina Albano , Virginia Giorno , Gema Pérez-Romero , Francisco de Asis Torres-Ruiz
A Susceptible-Infected-Removed stochastic model is presented, in which the stochasticity is introduced through two independent Brownian motions in the dynamics of the Susceptible and Infected populations. To account for the natural evolution of the Susceptible population, a growth function is considered in which size is influenced by the birth and death of individuals. Inference for such a model is addressed by means of a Quasi Maximum Likelihood Estimation (QMLE) method. The resulting nonlinear system can be numerically solved by iterative procedures. A technique to obtain the initial solutions usually required by such methods is also provided. Finally, simulation studies are performed for three well-known growth functions, namely Gompertz, Logistic and Bertalanffy curves. The performance of the initial estimates of the involved parameters is assessed, and the goodness of the proposed methodology is evaluated.
提出了一种易感-感染-去除随机模型,该模型通过易感种群和感染种群动力学中的两个独立布朗运动引入随机性。为了解释易感群体的自然进化,考虑了一个生长函数,其中大小受个体出生和死亡的影响。利用拟极大似然估计(Quasi Maximum Likelihood Estimation, QMLE)方法解决了该模型的推理问题。所得到的非线性系统可以通过迭代过程进行数值求解。本文还提供了一种获得这些方法通常需要的初始解的技术。最后,对Gompertz曲线、Logistic曲线和Bertalanffy曲线这三种著名的生长函数进行了仿真研究。评估了所涉及参数的初始估计的性能,并评估了所提出方法的优点。
{"title":"Inference on a stochastic SIR model including growth curves","authors":"Giuseppina Albano ,&nbsp;Virginia Giorno ,&nbsp;Gema Pérez-Romero ,&nbsp;Francisco de Asis Torres-Ruiz","doi":"10.1016/j.csda.2025.108231","DOIUrl":"10.1016/j.csda.2025.108231","url":null,"abstract":"<div><div>A Susceptible-Infected-Removed stochastic model is presented, in which the stochasticity is introduced through two independent Brownian motions in the dynamics of the Susceptible and Infected populations. To account for the natural evolution of the Susceptible population, a growth function is considered in which size is influenced by the birth and death of individuals. Inference for such a model is addressed by means of a Quasi Maximum Likelihood Estimation (QMLE) method. The resulting nonlinear system can be numerically solved by iterative procedures. A technique to obtain the initial solutions usually required by such methods is also provided. Finally, simulation studies are performed for three well-known growth functions, namely Gompertz, Logistic and Bertalanffy curves. The performance of the initial estimates of the involved parameters is assessed, and the goodness of the proposed methodology is evaluated.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108231"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1