首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
An integrated method for clustering and association network inference 一种聚类与关联网络推理的集成方法
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-07-01 Epub Date: 2026-01-28 DOI: 10.1016/j.csda.2026.108347
Jeanne Tous, Julien Chiquet
High dimensional Gaussian graphical models provide a rigorous framework to describe a network of statistical dependencies between variables, such as genes in genomic regulation studies or species in ecology. Penalized methods, including the standard Graphical-Lasso, are well-known approaches to infer the parameters of these models. As the number of variables in the model grow, the network inference and interpretation become more complex. The Normal-Block model is discussed, a model that clusters variables and considers a network at the cluster level. This both adds structure to the network and reduces the number of parameters at stake, thereby easing the inference and interpretation of the underlying network. The approach builds on Graphical-Lasso to add a penalty on the network’s edges and limit the detection of spurious dependencies. A zero-inflated version of the model is also proposed to account for real-world data properties. For the inference procedure, two approaches are introduced, a two-step method based on existing approaches and an original, more rigorous method that simultaneously infers the clustering of variables and the association network between clusters, using a penalized variational Expectation-Maximization approach. An implementation of the model in R, in a package called normalblockr, is available on github1. The results of the models in terms of clustering and network inference are presented, using both simulated data and various types of real-world data (proteomics and words occurrences on webpages).
高维高斯图形模型提供了一个严格的框架来描述变量之间的统计依赖网络,例如基因组调控研究中的基因或生态学中的物种。惩罚方法,包括标准的Graphical-Lasso,是众所周知的推断这些模型参数的方法。随着模型中变量数量的增加,网络推理和解释变得更加复杂。讨论了Normal-Block模型,该模型将变量聚类并在聚类级别考虑网络。这既增加了网络的结构,又减少了相关参数的数量,从而简化了对底层网络的推断和解释。该方法建立在graphiclasso的基础上,在网络的边缘上增加了惩罚,并限制了对虚假依赖的检测。还提出了模型的零膨胀版本,以考虑现实世界的数据属性。对于推理过程,介绍了两种方法,一种是基于现有方法的两步方法,另一种是使用惩罚变分期望最大化方法同时推断变量的聚类和聚类之间的关联网络的原始的,更严格的方法。该模型在R中的实现,在一个名为normalblockr的包中,可以在github1上获得。使用模拟数据和各种类型的现实世界数据(蛋白质组学和网页上的单词出现),给出了模型在聚类和网络推理方面的结果。
{"title":"An integrated method for clustering and association network inference","authors":"Jeanne Tous,&nbsp;Julien Chiquet","doi":"10.1016/j.csda.2026.108347","DOIUrl":"10.1016/j.csda.2026.108347","url":null,"abstract":"<div><div>High dimensional Gaussian graphical models provide a rigorous framework to describe a network of statistical dependencies between variables, such as genes in genomic regulation studies or species in ecology. Penalized methods, including the standard Graphical-Lasso, are well-known approaches to infer the parameters of these models. As the number of variables in the model grow, the network inference and interpretation become more complex. The Normal-Block model is discussed, a model that clusters variables and considers a network at the cluster level. This both adds structure to the network and reduces the number of parameters at stake, thereby easing the inference and interpretation of the underlying network. The approach builds on Graphical-Lasso to add a penalty on the network’s edges and limit the detection of spurious dependencies. A zero-inflated version of the model is also proposed to account for real-world data properties. For the inference procedure, two approaches are introduced, a two-step method based on existing approaches and an original, more rigorous method that simultaneously infers the clustering of variables and the association network between clusters, using a penalized variational Expectation-Maximization approach. An implementation of the model in R, in a package called <strong>normalblockr</strong>, is available on github<span><span><sup>1</sup></span></span>. The results of the models in terms of clustering and network inference are presented, using both simulated data and various types of real-world data (proteomics and words occurrences on webpages).</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108347"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Renewable penalized linear regression via inverse probability weighting for streaming data with missing covariates 可再生通过对缺少协变量的流数据的逆概率加权惩罚线性回归
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-07-01 Epub Date: 2026-01-24 DOI: 10.1016/j.csda.2025.108338
Kang Meng, Yujie Gai
A renewable weighted estimation method for linear regression with non-convex regularization is proposed, tailored for streaming data with missing covariates. The proposed method is implemented via a two-step estimation strategy. In the first step, a renewable formulation of the parameter of interest in the propensity score function is derived. Based on this, a renewable weighted optimization objective for the regression coefficients is constructed in the second step, which is updated using the current data and summary statistics from historical data. The objective is solved via a locally adaptive majorize-minimization algorithm with previous estimates as initialization, while the penalty parameter is determined using the proposed online rolling validation procedure. Theoretical results demonstrate that the renewable estimator is asymptotically normal and maintains estimation efficiency compared to offline methods that process all data at once. Simulation studies and real data analysis further confirm that the proposed estimator achieves competitive statistical performance while significantly improving computational efficiency and reducing memory requirements.
针对协变量缺失的流数据,提出了一种非凸正则化线性回归的可更新加权估计方法。该方法通过两步估计策略实现。在第一步中,导出了倾向得分函数中感兴趣参数的可更新公式。在此基础上,第二步构建回归系数的可更新加权优化目标,利用当前数据和历史数据的汇总统计更新目标。该算法采用局部自适应最大-最小算法求解目标,初始化算法以先前的估计值为初始化,同时采用所提出的在线滚动验证程序确定惩罚参数。理论结果表明,与一次性处理所有数据的离线方法相比,可再生估计器是渐近正态的,并且保持了估计效率。仿真研究和实际数据分析进一步证实,该估计器在显著提高计算效率和降低内存需求的同时,实现了具有竞争力的统计性能。
{"title":"Renewable penalized linear regression via inverse probability weighting for streaming data with missing covariates","authors":"Kang Meng,&nbsp;Yujie Gai","doi":"10.1016/j.csda.2025.108338","DOIUrl":"10.1016/j.csda.2025.108338","url":null,"abstract":"<div><div>A renewable weighted estimation method for linear regression with non-convex regularization is proposed, tailored for streaming data with missing covariates. The proposed method is implemented via a two-step estimation strategy. In the first step, a renewable formulation of the parameter of interest in the propensity score function is derived. Based on this, a renewable weighted optimization objective for the regression coefficients is constructed in the second step, which is updated using the current data and summary statistics from historical data. The objective is solved via a locally adaptive majorize-minimization algorithm with previous estimates as initialization, while the penalty parameter is determined using the proposed online rolling validation procedure. Theoretical results demonstrate that the renewable estimator is asymptotically normal and maintains estimation efficiency compared to offline methods that process all data at once. Simulation studies and real data analysis further confirm that the proposed estimator achieves competitive statistical performance while significantly improving computational efficiency and reducing memory requirements.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108338"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A smoothed maximum rank correlation estimator for deep ordinal choice models 深度有序选择模型的平滑最大秩相关估计
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-07-01 Epub Date: 2026-01-21 DOI: 10.1016/j.csda.2026.108345
Yiwei Fan , Xiaoshi Lu , Xiaoling Lu
A smoothed maximum rank correlation (MRC) estimator for ordinal choice models is introduced, combining a linear function with a nonlinear component modeled by deep neural networks to achieve both identifiability and interpretability. A two-step estimation algorithm is designed that maintains the order relations among outputs without relying on the parallelism assumption, making it appealing in practical applicability. The statistical properties of the smoothed MRC estimator are established under regular conditions, including identification, convergence rate, and minimax optimality, while allowing the number of categories to increase with sample size. Our theoretical results extend beyond ordinal choice models and apply to a broad range of generalized regression models. Extensive simulations demonstrate the superiority of the proposed method in classification accuracy and interpretability. Its effectiveness is further validated through applications to twelve benchmark datasets and an online education dataset.
引入了一种光滑最大秩相关估计器,将线性函数与深度神经网络建模的非线性分量相结合,实现了有序选择模型的可辨识性和可解释性。设计了一种两步估计算法,该算法不依赖于并行性假设,保持了输出之间的顺序关系,具有较好的实用性。平滑MRC估计器的统计性质在规则条件下建立,包括识别,收敛速度和最小最大最优性,同时允许类别数量随样本量增加。我们的理论结果超越了有序选择模型,并适用于广泛的广义回归模型。大量的仿真实验证明了该方法在分类精度和可解释性方面的优越性。通过对12个基准数据集和一个在线教育数据集的应用,进一步验证了其有效性。
{"title":"A smoothed maximum rank correlation estimator for deep ordinal choice models","authors":"Yiwei Fan ,&nbsp;Xiaoshi Lu ,&nbsp;Xiaoling Lu","doi":"10.1016/j.csda.2026.108345","DOIUrl":"10.1016/j.csda.2026.108345","url":null,"abstract":"<div><div>A smoothed maximum rank correlation (MRC) estimator for ordinal choice models is introduced, combining a linear function with a nonlinear component modeled by deep neural networks to achieve both identifiability and interpretability. A two-step estimation algorithm is designed that maintains the order relations among outputs without relying on the parallelism assumption, making it appealing in practical applicability. The statistical properties of the smoothed MRC estimator are established under regular conditions, including identification, convergence rate, and minimax optimality, while allowing the number of categories to increase with sample size. Our theoretical results extend beyond ordinal choice models and apply to a broad range of generalized regression models. Extensive simulations demonstrate the superiority of the proposed method in classification accuracy and interpretability. Its effectiveness is further validated through applications to twelve benchmark datasets and an online education dataset.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108345"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated specification search for composite-based structural equation modeling: A genetic approach 基于复合材料结构方程建模的自动规范搜索:一种遗传方法
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-07-01 Epub Date: 2026-01-29 DOI: 10.1016/j.csda.2026.108348
Laura Trinchera , Gloria Pietropolli , Mauro Castelli , Florian Schuberth
Structural Equation Modeling (SEM) is primarily employed as a confirmatory approach for empirically testing theoretical models by assessing how well they fit collected data. In practice, researchers frequently take a more exploratory approach and manually assess alternative models. Although automated search techniques have been developed for factor-based SEM to identify the best-fitting model, automated specification search remains largely unexplored in composite-based SEM. To address this gap, a new method is introduced: Automated Genetic Algorithm Specification Search for Partial Least Squares Path Modeling (AGAS-PLS). The proposed algorithm combines partial least squares path modeling with a genetic algorithm to identify the “best” structural model. A Monte Carlo simulation was conducted to assess the ability of AGAS-PLS to accurately identify the structural model of the data-generating process under various conditions, including different sample sizes and levels of model complexity. The practical applicability of AGAS-PLS was further illustrated using empirical data.
结构方程建模(SEM)主要是作为一种验证方法,通过评估理论模型与收集数据的拟合程度,对理论模型进行实证检验。在实践中,研究人员经常采用更具探索性的方法,并手动评估替代模型。尽管自动化搜索技术已经被开发出来用于基于因素的扫描电镜来识别最合适的模型,但自动化规范搜索在基于复合的扫描电镜中仍然很大程度上未被探索。为了解决这一问题,提出了一种新的方法:自动遗传算法规范搜索偏最小二乘路径建模(AGAS-PLS)。该算法结合了偏最小二乘路径建模和遗传算法来识别“最佳”结构模型。通过蒙特卡罗仿真来评估AGAS-PLS在各种条件下准确识别数据生成过程结构模型的能力,包括不同的样本量和模型复杂程度。并用实证数据进一步说明了AGAS-PLS的实际适用性。
{"title":"Automated specification search for composite-based structural equation modeling: A genetic approach","authors":"Laura Trinchera ,&nbsp;Gloria Pietropolli ,&nbsp;Mauro Castelli ,&nbsp;Florian Schuberth","doi":"10.1016/j.csda.2026.108348","DOIUrl":"10.1016/j.csda.2026.108348","url":null,"abstract":"<div><div>Structural Equation Modeling (SEM) is primarily employed as a confirmatory approach for empirically testing theoretical models by assessing how well they fit collected data. In practice, researchers frequently take a more exploratory approach and manually assess alternative models. Although automated search techniques have been developed for factor-based SEM to identify the best-fitting model, automated specification search remains largely unexplored in composite-based SEM. To address this gap, a new method is introduced: Automated Genetic Algorithm Specification Search for Partial Least Squares Path Modeling (AGAS-PLS). The proposed algorithm combines partial least squares path modeling with a genetic algorithm to identify the “best” structural model. A Monte Carlo simulation was conducted to assess the ability of AGAS-PLS to accurately identify the structural model of the data-generating process under various conditions, including different sample sizes and levels of model complexity. The practical applicability of AGAS-PLS was further illustrated using empirical data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108348"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Certifiably optimal direction estimation in sparse single-index model 稀疏单指标模型的可证明最优方向估计
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-07-01 Epub Date: 2025-12-01 DOI: 10.1016/j.csda.2025.108307
Yangzhou Chen , Lei Yan , Xin Chen , Shuaida He
In this paper, we propose a novel method for coefficient estimation in sparse single-index models (SIM). Our approach employs a customized branch-and-bound algorithm to efficiently solve the non-convex problem of sparse direction estimation, which arises from the discrete nature of variable selection. To address this non-convex optimization problem, we derive upper bounds using techniques such as spectral decomposition, matrix inequalities, and the Gershgorin circle theorem, while the lower bounds are obtained through methods like vector truncation and adaptations of the Rifle algorithm. Furthermore, we design customized branching and node selection strategies, with hyperparameters chosen based on AIC, BIC, and HBIC criteria. We prove the convergence of our algorithm, ensuring it reliably reaches optimal solutions. Extensive simulation studies and real data analysis further illustrate the reliable performance and applicability of our proposed method.
本文提出了一种稀疏单指标模型(SIM)的系数估计新方法。该方法采用自定义分支定界算法,有效地解决了稀疏方向估计的非凸问题,该问题源于变量选择的离散性。为了解决这个非凸优化问题,我们使用谱分解、矩阵不等式和Gershgorin圆定理等技术推导出上界,而下界则通过向量截断和改进Rifle算法等方法获得。此外,我们设计了定制的分支和节点选择策略,并根据AIC, BIC和HBIC标准选择超参数。证明了算法的收敛性,保证了算法能可靠地得到最优解。大量的仿真研究和实际数据分析进一步证明了该方法的可靠性和适用性。
{"title":"Certifiably optimal direction estimation in sparse single-index model","authors":"Yangzhou Chen ,&nbsp;Lei Yan ,&nbsp;Xin Chen ,&nbsp;Shuaida He","doi":"10.1016/j.csda.2025.108307","DOIUrl":"10.1016/j.csda.2025.108307","url":null,"abstract":"<div><div>In this paper, we propose a novel method for coefficient estimation in sparse single-index models (SIM). Our approach employs a customized branch-and-bound algorithm to efficiently solve the non-convex problem of sparse direction estimation, which arises from the discrete nature of variable selection. To address this non-convex optimization problem, we derive upper bounds using techniques such as spectral decomposition, matrix inequalities, and the Gershgorin circle theorem, while the lower bounds are obtained through methods like vector truncation and adaptations of the Rifle algorithm. Furthermore, we design customized branching and node selection strategies, with hyperparameters chosen based on AIC, BIC, and HBIC criteria. We prove the convergence of our algorithm, ensuring it reliably reaches optimal solutions. Extensive simulation studies and real data analysis further illustrate the reliable performance and applicability of our proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108307"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient and robust block designs for order-of-addition experiments 有效和稳健的块设计的顺序加法实验
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-07-01 Epub Date: 2026-01-27 DOI: 10.1016/j.csda.2026.108346
Chang-Yun Lin
Designs for Order-of-Addition (OofA) experiments have received growing attention due to their impact on responses based on the sequence of component addition. In certain cases, these experiments involve heterogeneous groups of units, which necessitates the use of blocking to manage variation effects. Despite this, the exploration of block OofA designs remains limited in the literature. As experiments become increasingly complex, addressing this gap is essential to ensure that the designs accurately reflect the effects of the addition sequence and effectively handle the associated variability. Motivated by this, the study seeks to address the gap by expanding the indicator function framework for block OofA designs. The word length pattern is proposed as a criterion for selecting robust block OofA designs. To improve search efficiency and reduce computational demands, an algorithm is developed that employ orthogonal Latin squares for design construction and selection, thereby minimizing the need for exhaustive searches. The analysis, supported by correlation plots, reveals that the algorithms effectively manage confounding and aliasing between effects. Additionally, simulation studies indicate that designs based on the proposed criterion and algorithms achieve power and type I error rates comparable to those of full block OofA designs. This approach offers a practical and efficient method for constructing block OofA designs and may provide valuable insights for future research and applications.
加法顺序(OofA)实验的设计由于其对基于组分加法顺序的响应的影响而受到越来越多的关注。在某些情况下,这些实验涉及异质单元群,这就需要使用阻塞来管理变异效应。尽管如此,对块OofA设计的探索在文献中仍然有限。随着实验变得越来越复杂,解决这一差距是必不可少的,以确保设计准确地反映了加法序列的影响,并有效地处理相关的可变性。受此启发,本研究试图通过扩展块OofA设计的指标功能框架来解决这一差距。提出了字长模式作为选择稳健块OofA设计的标准。为了提高搜索效率和减少计算量,提出了一种采用正交拉丁方进行设计构造和选择的算法,从而最大限度地减少了穷举搜索的需要。在相关图的支持下,分析表明算法有效地处理了效果之间的混淆和混叠。此外,仿真研究表明,基于所提出的准则和算法的设计实现了与全块OofA设计相当的功率和I型错误率。该方法为构建块OofA设计提供了一种实用而有效的方法,并可能为未来的研究和应用提供有价值的见解。
{"title":"Efficient and robust block designs for order-of-addition experiments","authors":"Chang-Yun Lin","doi":"10.1016/j.csda.2026.108346","DOIUrl":"10.1016/j.csda.2026.108346","url":null,"abstract":"<div><div>Designs for Order-of-Addition (OofA) experiments have received growing attention due to their impact on responses based on the sequence of component addition. In certain cases, these experiments involve heterogeneous groups of units, which necessitates the use of blocking to manage variation effects. Despite this, the exploration of block OofA designs remains limited in the literature. As experiments become increasingly complex, addressing this gap is essential to ensure that the designs accurately reflect the effects of the addition sequence and effectively handle the associated variability. Motivated by this, the study seeks to address the gap by expanding the indicator function framework for block OofA designs. The word length pattern is proposed as a criterion for selecting robust block OofA designs. To improve search efficiency and reduce computational demands, an algorithm is developed that employ orthogonal Latin squares for design construction and selection, thereby minimizing the need for exhaustive searches. The analysis, supported by correlation plots, reveals that the algorithms effectively manage confounding and aliasing between effects. Additionally, simulation studies indicate that designs based on the proposed criterion and algorithms achieve power and type I error rates comparable to those of full block OofA designs. This approach offers a practical and efficient method for constructing block OofA designs and may provide valuable insights for future research and applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108346"},"PeriodicalIF":1.6,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copula-based mixtures of regression models for multivariate response data 多元响应数据的copula混合回归模型
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2026-01-14 DOI: 10.1016/j.csda.2026.108340
Xuetong Cui , Orla A. Murphy , Paul D. McNicholas
Clustering is a powerful technique for uncovering hidden patterns or subgroups within complex datasets. Recently, the use of mixtures of multiple linear regression models has gained popularity due to their ability to account for underlying heterogeneity in regression-type data and to provide a comprehensive understanding of covariate impacts across latent subgroups. However, models tailored for a multivariate response are relatively rare, especially when the response variables are dependent. Copula regression addresses this issue by employing copulas to model dependencies between response variables. To address this need, a copula-based finite mixture of regression models is proposed for clustering and interpreting covariate effects in heterogeneous multivariate continuous response data. An expectation-conditional-maximization algorithm is used to estimate the model. Simulation studies and real-data analyses illustrate the improved clustering performance of the proposed models compared to existing methods.
聚类是一种强大的技术,用于发现复杂数据集中隐藏的模式或子组。最近,多元线性回归模型的混合使用越来越受欢迎,因为它们能够解释回归类型数据中的潜在异质性,并提供对潜在亚组间协变量影响的全面理解。然而,为多变量响应量身定制的模型相对较少,特别是在响应变量相互依赖的情况下。Copula回归通过使用Copula来模拟响应变量之间的依赖关系来解决这个问题。为了满足这一需求,提出了一种基于copula的有限混合回归模型,用于聚类和解释异质多元连续响应数据中的协变量效应。采用期望-条件最大化算法对模型进行估计。仿真研究和实际数据分析表明,与现有方法相比,所提模型的聚类性能有所提高。
{"title":"Copula-based mixtures of regression models for multivariate response data","authors":"Xuetong Cui ,&nbsp;Orla A. Murphy ,&nbsp;Paul D. McNicholas","doi":"10.1016/j.csda.2026.108340","DOIUrl":"10.1016/j.csda.2026.108340","url":null,"abstract":"<div><div>Clustering is a powerful technique for uncovering hidden patterns or subgroups within complex datasets. Recently, the use of mixtures of multiple linear regression models has gained popularity due to their ability to account for underlying heterogeneity in regression-type data and to provide a comprehensive understanding of covariate impacts across latent subgroups. However, models tailored for a multivariate response are relatively rare, especially when the response variables are dependent. Copula regression addresses this issue by employing copulas to model dependencies between response variables. To address this need, a copula-based finite mixture of regression models is proposed for clustering and interpreting covariate effects in heterogeneous multivariate continuous response data. An expectation-conditional-maximization algorithm is used to estimate the model. Simulation studies and real-data analyses illustrate the improved clustering performance of the proposed models compared to existing methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108340"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Likelihood inference in Gaussian copula models for count time series via minimax exponential tilting 基于极大极小指数倾斜的计数时间序列高斯联结模型的似然推断
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2026-01-20 DOI: 10.1016/j.csda.2026.108344
Quynh Nhu Nguyen, Victor De Oliveira
Count time series arise in diverse contexts and may display a diversity of distributional features that may include overdispersion, zero–inflation, covariates’ effects and complex dependence structures. A class of models with the potential to account for this diversity is that of Gaussian copulas, which are computationally challenging to fit. A scalable and accurate likelihood approximation strategy is proposed that employs minimax exponential tilting (MET) to fit Gaussian copula models with arbitrary marginals and ARMA latent processes to count time series. The proposed method, called Time Series Minimax Exponential Tilting (TMET), exploits the exact conditional structure of causal and invertible ARMA processes to construct an optimized importance sampling density. Costly Cholesky decompositions are avoided by using a simplified Innovations algorithm to recursively compute conditional means and variances, and further accelerates computation through a sparse representation of the best linear prediction matrix. These innovations achieve linear computational complexity in the series length, while preserving key theoretical guarantees, including vanishing relative error in rare–event regimes. Simulation studies show that TMET outperforms widely used methods, including the Geweke–Hajivassiliou–Keane (GHK) simulator and the recent Vecchia–based MET (VMET) approach, especially in scenarios with low counts, strong dependence, and moving average latent processes. Beyond estimation, the copula framework is extended to include predictive inference and model diagnostics based on scoring rules and randomized quantile residuals. A real–world application to temperature data from the Kickapoo Downtown Airport in Texas demonstrates TMET’s advantages over the commonly used GHK simulator.
计数时间序列出现在不同的背景下,可能表现出多种分布特征,包括过分散、零膨胀、协变量效应和复杂的依赖结构。一类有可能解释这种多样性的模型是高斯copulas,它在计算上很难拟合。提出了一种可扩展的精确似然逼近策略,利用极小极大指数倾斜(MET)拟合任意边际高斯copula模型和ARMA潜在过程对时间序列进行计数。所提出的方法,称为时间序列极小极大指数倾斜(TMET),利用因果和可逆ARMA过程的精确条件结构来构建优化的重要抽样密度。采用简化的创新算法递归计算条件均值和方差,避免了代价高昂的Cholesky分解,并通过最佳线性预测矩阵的稀疏表示进一步加快了计算速度。这些创新实现了序列长度的线性计算复杂性,同时保留了关键的理论保证,包括在罕见事件政权中消失的相对误差。仿真研究表明,TMET方法优于广泛使用的方法,包括Geweke-Hajivassiliou-Keane (GHK)模拟器和最近基于vechia的MET (VMET)方法,特别是在计数低、依赖性强和移动平均潜在过程的场景下。在估计之外,扩展了copula框架,包括基于评分规则和随机分位数残差的预测推理和模型诊断。对德克萨斯州Kickapoo市中心机场温度数据的实际应用表明,TMET比常用的GHK模拟器具有优势。
{"title":"Likelihood inference in Gaussian copula models for count time series via minimax exponential tilting","authors":"Quynh Nhu Nguyen,&nbsp;Victor De Oliveira","doi":"10.1016/j.csda.2026.108344","DOIUrl":"10.1016/j.csda.2026.108344","url":null,"abstract":"<div><div>Count time series arise in diverse contexts and may display a diversity of distributional features that may include overdispersion, zero–inflation, covariates’ effects and complex dependence structures. A class of models with the potential to account for this diversity is that of Gaussian copulas, which are computationally challenging to fit. A scalable and accurate likelihood approximation strategy is proposed that employs minimax exponential tilting (MET) to fit Gaussian copula models with arbitrary marginals and ARMA latent processes to count time series. The proposed method, called <em>Time Series Minimax Exponential Tilting</em> (TMET), exploits the exact conditional structure of causal and invertible ARMA processes to construct an optimized importance sampling density. Costly Cholesky decompositions are avoided by using a simplified Innovations algorithm to recursively compute conditional means and variances, and further accelerates computation through a sparse representation of the best linear prediction matrix. These innovations achieve linear computational complexity in the series length, while preserving key theoretical guarantees, including vanishing relative error in rare–event regimes. Simulation studies show that TMET outperforms widely used methods, including the Geweke–Hajivassiliou–Keane (GHK) simulator and the recent Vecchia–based MET (VMET) approach, especially in scenarios with low counts, strong dependence, and moving average latent processes. Beyond estimation, the copula framework is extended to include predictive inference and model diagnostics based on scoring rules and randomized quantile residuals. A real–world application to temperature data from the Kickapoo Downtown Airport in Texas demonstrates TMET’s advantages over the commonly used GHK simulator.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108344"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online and offline robust multivariate linear regression 在线和离线鲁棒多元线性回归
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2026-01-17 DOI: 10.1016/j.csda.2026.108341
Antoine Godichon-Baggioni , Stéphane Robin , Laure Sansonnet
The robust estimation of the parameters of multivariate Gaussian linear regression models is considered by using robust versions of the usual (Mahalanobis) least-square criterion, with or without Ridge regularization. Two methods of estimation are introduced: (i) online stochastic gradient descent algorithms and their averaged variants, and (ii) offline fixed-point algorithms. These methods are applied to both the standard and Mahalanobis least-squares criteria, as well as to their regularized counterparts. Under weak assumptions, the resulting estimators are shown to be asymptotically normal. Since the noise covariance matrix is generally unknown, a robust estimate of this matrix is incorporated into the Mahalanobis-based stochastic gradient descent algorithms. Numerical experiments on synthetic data demonstrate a substantial gain in robustness compared with classical least-squares estimators, while also highlighting the computational efficiency of the online procedures. All proposed algorithms are implemented in the R package RobRegression, available on CRAN.
通过使用通常的(Mahalanobis)最小二乘准则的鲁棒版本,考虑了多元高斯线性回归模型参数的鲁棒估计,有或没有Ridge正则化。介绍了两种估计方法:(i)在线随机梯度下降算法及其平均变体,(ii)离线不动点算法。这些方法既适用于标准和马氏最小二乘准则,也适用于它们的正则化对应物。在弱假设下,得到的估计量是渐近正态的。由于噪声协方差矩阵通常是未知的,因此该矩阵的鲁棒估计被纳入基于mahalanobis的随机梯度下降算法中。在合成数据上的数值实验表明,与经典的最小二乘估计相比,该方法的鲁棒性有了显著提高,同时也突出了在线程序的计算效率。所有提出的算法都在R包RobRegression中实现,可在CRAN上获得。
{"title":"Online and offline robust multivariate linear regression","authors":"Antoine Godichon-Baggioni ,&nbsp;Stéphane Robin ,&nbsp;Laure Sansonnet","doi":"10.1016/j.csda.2026.108341","DOIUrl":"10.1016/j.csda.2026.108341","url":null,"abstract":"<div><div>The robust estimation of the parameters of multivariate Gaussian linear regression models is considered by using robust versions of the usual (Mahalanobis) least-square criterion, with or without Ridge regularization. Two methods of estimation are introduced: (i) online stochastic gradient descent algorithms and their averaged variants, and (ii) offline fixed-point algorithms. These methods are applied to both the standard and Mahalanobis least-squares criteria, as well as to their regularized counterparts. Under weak assumptions, the resulting estimators are shown to be asymptotically normal. Since the noise covariance matrix is generally unknown, a robust estimate of this matrix is incorporated into the Mahalanobis-based stochastic gradient descent algorithms. Numerical experiments on synthetic data demonstrate a substantial gain in robustness compared with classical least-squares estimators, while also highlighting the computational efficiency of the online procedures. All proposed algorithms are implemented in the <span>R</span> package <span>RobRegression</span>, available on CRAN.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108341"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recursive variational Gaussian approximation with the Whittle likelihood for linear non-Gaussian state space models 线性非高斯状态空间模型的Whittle似然递归变分高斯逼近
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2025-12-27 DOI: 10.1016/j.csda.2025.108324
Bao Anh Vu , David Gunawan , Andrew Zammit-Mangion
Parameter inference for linear and non-Gaussian state space models is challenging because the likelihood function contains an intractable integral over the latent state variables. While Markov chain Monte Carlo (MCMC) methods provide exact samples from the posterior distribution as the number of samples goes to infinity, they tend to have high computational cost, particularly for observations of a long time series. When inference with MCMC methods is computationally expensive, variational Bayes (VB) methods are a useful alternative. VB methods approximate the posterior density of the parameters with a simple and tractable distribution found through optimisation. A novel sequential VB algorithm that makes use of the Whittle likelihood is proposed for computationally efficient parameter inference in linear, non-Gaussian state space models. The algorithm, called Recursive Variational Gaussian Approximation with the Whittle Likelihood (R-VGA-Whittle), updates the variational parameters by processing data in the frequency domain. At each iteration, R-VGA-Whittle requires the gradient and Hessian of the Whittle log-likelihood, which are available in closed form. Through several examples involving a linear Gaussian state space model; a univariate/bivariate stochastic volatility model; and a state space model with Student’s t measurement error, where the latent states follow an autoregressive fractionally integrated moving average (ARFIMA) model, R-VGA-Whittle is shown to provide good approximations to posterior distributions of the parameters, and it is very computationally efficient when compared to asymptotically exact methods such as Hamiltonian Monte Carlo.
线性和非高斯状态空间模型的参数推理具有挑战性,因为似然函数包含对潜在状态变量的难以处理的积分。当样本数量趋于无穷大时,马尔可夫链蒙特卡罗(MCMC)方法提供来自后验分布的精确样本,但它们往往具有很高的计算成本,特别是对于长时间序列的观测。当使用MCMC方法进行推理的计算成本很高时,变分贝叶斯(VB)方法是一种有用的替代方法。VB方法近似参数的后验密度,通过优化找到一个简单而易于处理的分布。提出了一种利用Whittle似然的序列VB算法,用于线性非高斯状态空间模型的高效参数推理。该算法被称为递归变分高斯近似与惠特尔似然(R-VGA-Whittle),通过在频域处理数据来更新变分参数。在每次迭代中,r - ga -Whittle需要Whittle对数似然的梯度和Hessian,它们以封闭形式可用。通过几个涉及线性高斯状态空间模型的例子;单变量/双变量随机波动模型;以及具有Student’s t测量误差的状态空间模型,其中潜在状态遵循自回归分数积分移动平均(ARFIMA)模型,R-VGA-Whittle被证明可以很好地近似参数的后验分布,并且与渐近精确方法(如hamilton - Monte Carlo)相比,它的计算效率非常高。
{"title":"Recursive variational Gaussian approximation with the Whittle likelihood for linear non-Gaussian state space models","authors":"Bao Anh Vu ,&nbsp;David Gunawan ,&nbsp;Andrew Zammit-Mangion","doi":"10.1016/j.csda.2025.108324","DOIUrl":"10.1016/j.csda.2025.108324","url":null,"abstract":"<div><div>Parameter inference for linear and non-Gaussian state space models is challenging because the likelihood function contains an intractable integral over the latent state variables. While Markov chain Monte Carlo (MCMC) methods provide exact samples from the posterior distribution as the number of samples goes to infinity, they tend to have high computational cost, particularly for observations of a long time series. When inference with MCMC methods is computationally expensive, variational Bayes (VB) methods are a useful alternative. VB methods approximate the posterior density of the parameters with a simple and tractable distribution found through optimisation. A novel sequential VB algorithm that makes use of the Whittle likelihood is proposed for computationally efficient parameter inference in linear, non-Gaussian state space models. The algorithm, called Recursive Variational Gaussian Approximation with the Whittle Likelihood (R-VGA-Whittle), updates the variational parameters by processing data in the frequency domain. At each iteration, R-VGA-Whittle requires the gradient and Hessian of the Whittle log-likelihood, which are available in closed form. Through several examples involving a linear Gaussian state space model; a univariate/bivariate stochastic volatility model; and a state space model with Student’s t measurement error, where the latent states follow an autoregressive fractionally integrated moving average (ARFIMA) model, R-VGA-Whittle is shown to provide good approximations to posterior distributions of the parameters, and it is very computationally efficient when compared to asymptotically exact methods such as Hamiltonian Monte Carlo.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108324"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1