首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Renewable penalized linear regression via inverse probability weighting for streaming data with missing covariates 可再生通过对缺少协变量的流数据的逆概率加权惩罚线性回归
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-24 DOI: 10.1016/j.csda.2025.108338
Kang Meng, Yujie Gai
A renewable weighted estimation method for linear regression with non-convex regularization is proposed, tailored for streaming data with missing covariates. The proposed method is implemented via a two-step estimation strategy. In the first step, a renewable formulation of the parameter of interest in the propensity score function is derived. Based on this, a renewable weighted optimization objective for the regression coefficients is constructed in the second step, which is updated using the current data and summary statistics from historical data. The objective is solved via a locally adaptive majorize-minimization algorithm with previous estimates as initialization, while the penalty parameter is determined using the proposed online rolling validation procedure. Theoretical results demonstrate that the renewable estimator is asymptotically normal and maintains estimation efficiency compared to offline methods that process all data at once. Simulation studies and real data analysis further confirm that the proposed estimator achieves competitive statistical performance while significantly improving computational efficiency and reducing memory requirements.
针对协变量缺失的流数据,提出了一种非凸正则化线性回归的可更新加权估计方法。该方法通过两步估计策略实现。在第一步中,导出了倾向得分函数中感兴趣参数的可更新公式。在此基础上,第二步构建回归系数的可更新加权优化目标,利用当前数据和历史数据的汇总统计更新目标。该算法采用局部自适应最大-最小算法求解目标,初始化算法以先前的估计值为初始化,同时采用所提出的在线滚动验证程序确定惩罚参数。理论结果表明,与一次性处理所有数据的离线方法相比,可再生估计器是渐近正态的,并且保持了估计效率。仿真研究和实际数据分析进一步证实,该估计器在显著提高计算效率和降低内存需求的同时,实现了具有竞争力的统计性能。
{"title":"Renewable penalized linear regression via inverse probability weighting for streaming data with missing covariates","authors":"Kang Meng,&nbsp;Yujie Gai","doi":"10.1016/j.csda.2025.108338","DOIUrl":"10.1016/j.csda.2025.108338","url":null,"abstract":"<div><div>A renewable weighted estimation method for linear regression with non-convex regularization is proposed, tailored for streaming data with missing covariates. The proposed method is implemented via a two-step estimation strategy. In the first step, a renewable formulation of the parameter of interest in the propensity score function is derived. Based on this, a renewable weighted optimization objective for the regression coefficients is constructed in the second step, which is updated using the current data and summary statistics from historical data. The objective is solved via a locally adaptive majorize-minimization algorithm with previous estimates as initialization, while the penalty parameter is determined using the proposed online rolling validation procedure. Theoretical results demonstrate that the renewable estimator is asymptotically normal and maintains estimation efficiency compared to offline methods that process all data at once. Simulation studies and real data analysis further confirm that the proposed estimator achieves competitive statistical performance while significantly improving computational efficiency and reducing memory requirements.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108338"},"PeriodicalIF":1.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A smoothed maximum rank correlation estimator for deep ordinal choice models 深度有序选择模型的平滑最大秩相关估计
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-21 DOI: 10.1016/j.csda.2026.108345
Yiwei Fan , Xiaoshi Lu , Xiaoling Lu
A smoothed maximum rank correlation (MRC) estimator for ordinal choice models is introduced, combining a linear function with a nonlinear component modeled by deep neural networks to achieve both identifiability and interpretability. A two-step estimation algorithm is designed that maintains the order relations among outputs without relying on the parallelism assumption, making it appealing in practical applicability. The statistical properties of the smoothed MRC estimator are established under regular conditions, including identification, convergence rate, and minimax optimality, while allowing the number of categories to increase with sample size. Our theoretical results extend beyond ordinal choice models and apply to a broad range of generalized regression models. Extensive simulations demonstrate the superiority of the proposed method in classification accuracy and interpretability. Its effectiveness is further validated through applications to twelve benchmark datasets and an online education dataset.
引入了一种光滑最大秩相关估计器,将线性函数与深度神经网络建模的非线性分量相结合,实现了有序选择模型的可辨识性和可解释性。设计了一种两步估计算法,该算法不依赖于并行性假设,保持了输出之间的顺序关系,具有较好的实用性。平滑MRC估计器的统计性质在规则条件下建立,包括识别,收敛速度和最小最大最优性,同时允许类别数量随样本量增加。我们的理论结果超越了有序选择模型,并适用于广泛的广义回归模型。大量的仿真实验证明了该方法在分类精度和可解释性方面的优越性。通过对12个基准数据集和一个在线教育数据集的应用,进一步验证了其有效性。
{"title":"A smoothed maximum rank correlation estimator for deep ordinal choice models","authors":"Yiwei Fan ,&nbsp;Xiaoshi Lu ,&nbsp;Xiaoling Lu","doi":"10.1016/j.csda.2026.108345","DOIUrl":"10.1016/j.csda.2026.108345","url":null,"abstract":"<div><div>A smoothed maximum rank correlation (MRC) estimator for ordinal choice models is introduced, combining a linear function with a nonlinear component modeled by deep neural networks to achieve both identifiability and interpretability. A two-step estimation algorithm is designed that maintains the order relations among outputs without relying on the parallelism assumption, making it appealing in practical applicability. The statistical properties of the smoothed MRC estimator are established under regular conditions, including identification, convergence rate, and minimax optimality, while allowing the number of categories to increase with sample size. Our theoretical results extend beyond ordinal choice models and apply to a broad range of generalized regression models. Extensive simulations demonstrate the superiority of the proposed method in classification accuracy and interpretability. Its effectiveness is further validated through applications to twelve benchmark datasets and an online education dataset.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"219 ","pages":"Article 108345"},"PeriodicalIF":1.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Likelihood inference in Gaussian copula models for count time series via minimax exponential tilting 基于极大极小指数倾斜的计数时间序列高斯联结模型的似然推断
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-20 DOI: 10.1016/j.csda.2026.108344
Quynh Nhu Nguyen, Victor De Oliveira
Count time series arise in diverse contexts and may display a diversity of distributional features that may include overdispersion, zero–inflation, covariates’ effects and complex dependence structures. A class of models with the potential to account for this diversity is that of Gaussian copulas, which are computationally challenging to fit. A scalable and accurate likelihood approximation strategy is proposed that employs minimax exponential tilting (MET) to fit Gaussian copula models with arbitrary marginals and ARMA latent processes to count time series. The proposed method, called Time Series Minimax Exponential Tilting (TMET), exploits the exact conditional structure of causal and invertible ARMA processes to construct an optimized importance sampling density. Costly Cholesky decompositions are avoided by using a simplified Innovations algorithm to recursively compute conditional means and variances, and further accelerates computation through a sparse representation of the best linear prediction matrix. These innovations achieve linear computational complexity in the series length, while preserving key theoretical guarantees, including vanishing relative error in rare–event regimes. Simulation studies show that TMET outperforms widely used methods, including the Geweke–Hajivassiliou–Keane (GHK) simulator and the recent Vecchia–based MET (VMET) approach, especially in scenarios with low counts, strong dependence, and moving average latent processes. Beyond estimation, the copula framework is extended to include predictive inference and model diagnostics based on scoring rules and randomized quantile residuals. A real–world application to temperature data from the Kickapoo Downtown Airport in Texas demonstrates TMET’s advantages over the commonly used GHK simulator.
计数时间序列出现在不同的背景下,可能表现出多种分布特征,包括过分散、零膨胀、协变量效应和复杂的依赖结构。一类有可能解释这种多样性的模型是高斯copulas,它在计算上很难拟合。提出了一种可扩展的精确似然逼近策略,利用极小极大指数倾斜(MET)拟合任意边际高斯copula模型和ARMA潜在过程对时间序列进行计数。所提出的方法,称为时间序列极小极大指数倾斜(TMET),利用因果和可逆ARMA过程的精确条件结构来构建优化的重要抽样密度。采用简化的创新算法递归计算条件均值和方差,避免了代价高昂的Cholesky分解,并通过最佳线性预测矩阵的稀疏表示进一步加快了计算速度。这些创新实现了序列长度的线性计算复杂性,同时保留了关键的理论保证,包括在罕见事件政权中消失的相对误差。仿真研究表明,TMET方法优于广泛使用的方法,包括Geweke-Hajivassiliou-Keane (GHK)模拟器和最近基于vechia的MET (VMET)方法,特别是在计数低、依赖性强和移动平均潜在过程的场景下。在估计之外,扩展了copula框架,包括基于评分规则和随机分位数残差的预测推理和模型诊断。对德克萨斯州Kickapoo市中心机场温度数据的实际应用表明,TMET比常用的GHK模拟器具有优势。
{"title":"Likelihood inference in Gaussian copula models for count time series via minimax exponential tilting","authors":"Quynh Nhu Nguyen,&nbsp;Victor De Oliveira","doi":"10.1016/j.csda.2026.108344","DOIUrl":"10.1016/j.csda.2026.108344","url":null,"abstract":"<div><div>Count time series arise in diverse contexts and may display a diversity of distributional features that may include overdispersion, zero–inflation, covariates’ effects and complex dependence structures. A class of models with the potential to account for this diversity is that of Gaussian copulas, which are computationally challenging to fit. A scalable and accurate likelihood approximation strategy is proposed that employs minimax exponential tilting (MET) to fit Gaussian copula models with arbitrary marginals and ARMA latent processes to count time series. The proposed method, called <em>Time Series Minimax Exponential Tilting</em> (TMET), exploits the exact conditional structure of causal and invertible ARMA processes to construct an optimized importance sampling density. Costly Cholesky decompositions are avoided by using a simplified Innovations algorithm to recursively compute conditional means and variances, and further accelerates computation through a sparse representation of the best linear prediction matrix. These innovations achieve linear computational complexity in the series length, while preserving key theoretical guarantees, including vanishing relative error in rare–event regimes. Simulation studies show that TMET outperforms widely used methods, including the Geweke–Hajivassiliou–Keane (GHK) simulator and the recent Vecchia–based MET (VMET) approach, especially in scenarios with low counts, strong dependence, and moving average latent processes. Beyond estimation, the copula framework is extended to include predictive inference and model diagnostics based on scoring rules and randomized quantile residuals. A real–world application to temperature data from the Kickapoo Downtown Airport in Texas demonstrates TMET’s advantages over the commonly used GHK simulator.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108344"},"PeriodicalIF":1.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online and offline robust multivariate linear regression 在线和离线鲁棒多元线性回归
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-17 DOI: 10.1016/j.csda.2026.108341
Antoine Godichon-Baggioni , Stéphane Robin , Laure Sansonnet
The robust estimation of the parameters of multivariate Gaussian linear regression models is considered by using robust versions of the usual (Mahalanobis) least-square criterion, with or without Ridge regularization. Two methods of estimation are introduced: (i) online stochastic gradient descent algorithms and their averaged variants, and (ii) offline fixed-point algorithms. These methods are applied to both the standard and Mahalanobis least-squares criteria, as well as to their regularized counterparts. Under weak assumptions, the resulting estimators are shown to be asymptotically normal. Since the noise covariance matrix is generally unknown, a robust estimate of this matrix is incorporated into the Mahalanobis-based stochastic gradient descent algorithms. Numerical experiments on synthetic data demonstrate a substantial gain in robustness compared with classical least-squares estimators, while also highlighting the computational efficiency of the online procedures. All proposed algorithms are implemented in the R package RobRegression, available on CRAN.
通过使用通常的(Mahalanobis)最小二乘准则的鲁棒版本,考虑了多元高斯线性回归模型参数的鲁棒估计,有或没有Ridge正则化。介绍了两种估计方法:(i)在线随机梯度下降算法及其平均变体,(ii)离线不动点算法。这些方法既适用于标准和马氏最小二乘准则,也适用于它们的正则化对应物。在弱假设下,得到的估计量是渐近正态的。由于噪声协方差矩阵通常是未知的,因此该矩阵的鲁棒估计被纳入基于mahalanobis的随机梯度下降算法中。在合成数据上的数值实验表明,与经典的最小二乘估计相比,该方法的鲁棒性有了显著提高,同时也突出了在线程序的计算效率。所有提出的算法都在R包RobRegression中实现,可在CRAN上获得。
{"title":"Online and offline robust multivariate linear regression","authors":"Antoine Godichon-Baggioni ,&nbsp;Stéphane Robin ,&nbsp;Laure Sansonnet","doi":"10.1016/j.csda.2026.108341","DOIUrl":"10.1016/j.csda.2026.108341","url":null,"abstract":"<div><div>The robust estimation of the parameters of multivariate Gaussian linear regression models is considered by using robust versions of the usual (Mahalanobis) least-square criterion, with or without Ridge regularization. Two methods of estimation are introduced: (i) online stochastic gradient descent algorithms and their averaged variants, and (ii) offline fixed-point algorithms. These methods are applied to both the standard and Mahalanobis least-squares criteria, as well as to their regularized counterparts. Under weak assumptions, the resulting estimators are shown to be asymptotically normal. Since the noise covariance matrix is generally unknown, a robust estimate of this matrix is incorporated into the Mahalanobis-based stochastic gradient descent algorithms. Numerical experiments on synthetic data demonstrate a substantial gain in robustness compared with classical least-squares estimators, while also highlighting the computational efficiency of the online procedures. All proposed algorithms are implemented in the <span>R</span> package <span>RobRegression</span>, available on CRAN.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108341"},"PeriodicalIF":1.6,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosted sliced regression for dimension reduction in binary classification 用于二分类降维的增强切片回归
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-17 DOI: 10.1016/j.csda.2026.108342
Qin Wang, Edmund Osei
Sufficient dimension reduction (SDR) aims at reducing the data dimensionality without loss of the information on the conditional distribution between the response and its high dimensional predictors. Most existing SDR methods were developed under a general regression model, and may lose efficiency when the response is binary. A novel approach is proposed in this study. It combines the gradient boosting machines (GBM) and the sliced regression (SR) to effectively recover the central dimension reduction subspace in binary classification. Numerical experiments and real data applications demonstrate its superior performance and scalability in computation.
充分降维(SDR)的目的是在不丢失响应与其高维预测量之间条件分布信息的情况下降低数据维数。现有的SDR方法大多是在一般回归模型下开发的,当响应为二元时可能会失去效率。本研究提出了一种新的方法。将梯度增强机(GBM)和切片回归(SR)相结合,有效地恢复了二值分类中的中心降维子空间。数值实验和实际数据应用表明,该方法具有良好的计算性能和可扩展性。
{"title":"Boosted sliced regression for dimension reduction in binary classification","authors":"Qin Wang,&nbsp;Edmund Osei","doi":"10.1016/j.csda.2026.108342","DOIUrl":"10.1016/j.csda.2026.108342","url":null,"abstract":"<div><div>Sufficient dimension reduction (SDR) aims at reducing the data dimensionality without loss of the information on the conditional distribution between the response and its high dimensional predictors. Most existing SDR methods were developed under a general regression model, and may lose efficiency when the response is binary. A novel approach is proposed in this study. It combines the gradient boosting machines (GBM) and the sliced regression (SR) to effectively recover the central dimension reduction subspace in binary classification. Numerical experiments and real data applications demonstrate its superior performance and scalability in computation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108342"},"PeriodicalIF":1.6,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copula-based mixtures of regression models for multivariate response data 多元响应数据的copula混合回归模型
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-14 DOI: 10.1016/j.csda.2026.108340
Xuetong Cui , Orla A. Murphy , Paul D. McNicholas
Clustering is a powerful technique for uncovering hidden patterns or subgroups within complex datasets. Recently, the use of mixtures of multiple linear regression models has gained popularity due to their ability to account for underlying heterogeneity in regression-type data and to provide a comprehensive understanding of covariate impacts across latent subgroups. However, models tailored for a multivariate response are relatively rare, especially when the response variables are dependent. Copula regression addresses this issue by employing copulas to model dependencies between response variables. To address this need, a copula-based finite mixture of regression models is proposed for clustering and interpreting covariate effects in heterogeneous multivariate continuous response data. An expectation-conditional-maximization algorithm is used to estimate the model. Simulation studies and real-data analyses illustrate the improved clustering performance of the proposed models compared to existing methods.
聚类是一种强大的技术,用于发现复杂数据集中隐藏的模式或子组。最近,多元线性回归模型的混合使用越来越受欢迎,因为它们能够解释回归类型数据中的潜在异质性,并提供对潜在亚组间协变量影响的全面理解。然而,为多变量响应量身定制的模型相对较少,特别是在响应变量相互依赖的情况下。Copula回归通过使用Copula来模拟响应变量之间的依赖关系来解决这个问题。为了满足这一需求,提出了一种基于copula的有限混合回归模型,用于聚类和解释异质多元连续响应数据中的协变量效应。采用期望-条件最大化算法对模型进行估计。仿真研究和实际数据分析表明,与现有方法相比,所提模型的聚类性能有所提高。
{"title":"Copula-based mixtures of regression models for multivariate response data","authors":"Xuetong Cui ,&nbsp;Orla A. Murphy ,&nbsp;Paul D. McNicholas","doi":"10.1016/j.csda.2026.108340","DOIUrl":"10.1016/j.csda.2026.108340","url":null,"abstract":"<div><div>Clustering is a powerful technique for uncovering hidden patterns or subgroups within complex datasets. Recently, the use of mixtures of multiple linear regression models has gained popularity due to their ability to account for underlying heterogeneity in regression-type data and to provide a comprehensive understanding of covariate impacts across latent subgroups. However, models tailored for a multivariate response are relatively rare, especially when the response variables are dependent. Copula regression addresses this issue by employing copulas to model dependencies between response variables. To address this need, a copula-based finite mixture of regression models is proposed for clustering and interpreting covariate effects in heterogeneous multivariate continuous response data. An expectation-conditional-maximization algorithm is used to estimate the model. Simulation studies and real-data analyses illustrate the improved clustering performance of the proposed models compared to existing methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108340"},"PeriodicalIF":1.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional independence test in factor models via projection correlation 基于投影相关的因子模型条件独立性检验
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-09 DOI: 10.1016/j.csda.2026.108339
Xilin Zhang , Hongxia Xu , Guoliang Fan , Liping Zhu
Among existing methods for testing independence, projection correlation possesses several appealing properties: it is insensitive to the dimensions of the two random vectors, invariant under orthogonal transformations, and requires no tuning parameters or moment conditions for its estimation. This paper proposes a projection correlation-based approach for measuring and testing conditional dependence within a factor model framework. The proposed measure accommodates response vectors and common factors of varying dimensions while allowing the number of factors to grow to infinity with the sample size. The asymptotic properties of the projection correlation statistic are established under both the null and alternative hypotheses. In addition, a general approach is introduced for constructing dependency graphs without the Gaussian assumption, utilizing the proposed test. Numerical simulations and real data analysis demonstrate the superiority and practicality of the proposed methods.
在现有的独立性测试方法中,投影相关性具有几个吸引人的特性:它对两个随机向量的维数不敏感,在正交变换下不变,并且不需要调整参数或力矩条件来估计。本文提出了一种基于投影相关性的方法来测量和测试因子模型框架内的条件依赖性。所提出的测量方法可以容纳不同维度的响应向量和公共因子,同时允许因子的数量随着样本量的增加而增长到无穷大。在零假设和备假设下,建立了投影相关统计量的渐近性质。此外,本文还介绍了一种构造依赖图的通用方法,该方法不使用高斯假设,利用所提出的检验。数值模拟和实际数据分析表明了所提方法的优越性和实用性。
{"title":"Conditional independence test in factor models via projection correlation","authors":"Xilin Zhang ,&nbsp;Hongxia Xu ,&nbsp;Guoliang Fan ,&nbsp;Liping Zhu","doi":"10.1016/j.csda.2026.108339","DOIUrl":"10.1016/j.csda.2026.108339","url":null,"abstract":"<div><div>Among existing methods for testing independence, projection correlation possesses several appealing properties: it is insensitive to the dimensions of the two random vectors, invariant under orthogonal transformations, and requires no tuning parameters or moment conditions for its estimation. This paper proposes a projection correlation-based approach for measuring and testing conditional dependence within a factor model framework. The proposed measure accommodates response vectors and common factors of varying dimensions while allowing the number of factors to grow to infinity with the sample size. The asymptotic properties of the projection correlation statistic are established under both the null and alternative hypotheses. In addition, a general approach is introduced for constructing dependency graphs without the Gaussian assumption, utilizing the proposed test. Numerical simulations and real data analysis demonstrate the superiority and practicality of the proposed methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108339"},"PeriodicalIF":1.6,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expectile periodogram Expectile周期图
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-31 DOI: 10.1016/j.csda.2025.108337
Tianbo Chen , Ta-Hsin Li , Hanbing Zhu , Wenwu Gao
This paper introduces a novel periodogram-like function, called the expectile periodogram (EP), for modeling spectral features of time series and detecting hidden periodicities. The EP is constructed from trigonometric expectile regression (ER), in which a specially designed loss function is used to substitute the squared ℓ2 norm that leads to the ordinary periodogram. The EP retains the key properties of the ordinary periodogram as a frequency-domain representation of serial dependence in time series, while offering a more comprehensive understanding by examining the data across the entire range of expectile levels. The asymptotic theory is established to investigate the relationship between the EP and the so-called expectile spectrum. Simulations demonstrate the efficiency of the EP in the presence of hidden periodicities. In addition, by leveraging the inherent two-dimensional nature of the EP, we train a deep learning model to classify earthquake waveform data. Notably, our approach outperforms alternative periodogram-based methods in terms of classification accuracy.
本文介绍了一种新的类周期图函数,称为期望周期图(EP),用于时间序列的频谱特征建模和隐藏周期检测。EP是由三角期望回归(ER)构造的,其中用一个特殊设计的损失函数来代替导致普通周期图的平方的l2范数。EP保留了普通周期图的关键属性,作为时间序列中序列依赖性的频域表示,同时通过检查整个预期水平范围内的数据,提供了更全面的理解。建立了渐近理论来研究EP与期望谱之间的关系。仿真结果表明了该方法在隐藏周期存在时的有效性。此外,通过利用EP固有的二维特性,我们训练了一个深度学习模型来对地震波形数据进行分类。值得注意的是,我们的方法在分类精度方面优于其他基于周期图的方法。
{"title":"Expectile periodogram","authors":"Tianbo Chen ,&nbsp;Ta-Hsin Li ,&nbsp;Hanbing Zhu ,&nbsp;Wenwu Gao","doi":"10.1016/j.csda.2025.108337","DOIUrl":"10.1016/j.csda.2025.108337","url":null,"abstract":"<div><div>This paper introduces a novel periodogram-like function, called the expectile periodogram (EP), for modeling spectral features of time series and detecting hidden periodicities. The EP is constructed from trigonometric expectile regression (ER), in which a specially designed loss function is used to substitute the squared ℓ<sub>2</sub> norm that leads to the ordinary periodogram. The EP retains the key properties of the ordinary periodogram as a frequency-domain representation of serial dependence in time series, while offering a more comprehensive understanding by examining the data across the entire range of expectile levels. The asymptotic theory is established to investigate the relationship between the EP and the so-called expectile spectrum. Simulations demonstrate the efficiency of the EP in the presence of hidden periodicities. In addition, by leveraging the inherent two-dimensional nature of the EP, we train a deep learning model to classify earthquake waveform data. Notably, our approach outperforms alternative periodogram-based methods in terms of classification accuracy.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108337"},"PeriodicalIF":1.6,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random multiplication versus random sum: Autoregressive-like models with integer-valued random inputs 随机乘法与随机和:具有整数值随机输入的自回归模型
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-27 DOI: 10.1016/j.csda.2025.108323
Abdelhakim Aknouche , Sónia Gouveia , Manuel G. Scotto
A common approach to analyze time series of counts is to fit models based on random sum operators. As an alternative, this paper introduces time series models based on a random multiplication operator, which is simply the multiplication of a variable operand by an integer-valued random coefficient, whose mean is the constant operand. Such an operation is endowed into autoregressive-like models with integer-valued random inputs, addressed as RMINAR. Two special variants are studied, namely the N0-valued random coefficient autoregressive model and the N0-valued random coefficient multiplicative error model. Furthermore, Z-valued extensions are also considered. The dynamic structure of the proposed models is studied in detail. In particular, their corresponding solutions are everywhere strictly stationary and ergodic, which is not common in either the literature on integer-valued time series models or real-valued random coefficient autoregressive models. Therefore, RMINAR model parameters are estimated using a four-stage weighted least squares estimator, with consistency and asymptotic normality established everywhere in the parameter space. Finally, the performance of the new RMINAR models is illustrated with simulated and empirical examples.
分析计数时间序列的一种常用方法是基于随机和运算符拟合模型。作为替代方案,本文介绍了基于随机乘法算子的时间序列模型,即变量操作数乘以整数随机系数,其平均值为常数操作数。这样的操作被赋予具有整数值随机输入的类自回归模型,称为RMINAR。研究了两种特殊的变量,即n0值随机系数自回归模型和n0值随机系数乘法误差模型。此外,还考虑了z值扩展。对模型的动态结构进行了详细的研究。特别是,它们对应的解处处是严格平稳和遍历的,这在整数值时间序列模型和实值随机系数自回归模型的文献中都不常见。因此,使用四阶段加权最小二乘估计器估计RMINAR模型参数,在参数空间各处建立一致性和渐近正态性。最后,通过仿真和实证验证了新模型的性能。
{"title":"Random multiplication versus random sum: Autoregressive-like models with integer-valued random inputs","authors":"Abdelhakim Aknouche ,&nbsp;Sónia Gouveia ,&nbsp;Manuel G. Scotto","doi":"10.1016/j.csda.2025.108323","DOIUrl":"10.1016/j.csda.2025.108323","url":null,"abstract":"<div><div>A common approach to analyze time series of counts is to fit models based on random sum operators. As an alternative, this paper introduces time series models based on a random multiplication operator, which is simply the multiplication of a variable operand by an integer-valued random coefficient, whose mean is the constant operand. Such an operation is endowed into autoregressive-like models with integer-valued random inputs, addressed as RMINAR. Two special variants are studied, namely the <span><math><msub><mi>N</mi><mn>0</mn></msub></math></span>-valued random coefficient autoregressive model and the <span><math><msub><mi>N</mi><mn>0</mn></msub></math></span>-valued random coefficient multiplicative error model. Furthermore, <span><math><mi>Z</mi></math></span>-valued extensions are also considered. The dynamic structure of the proposed models is studied in detail. In particular, their corresponding solutions are everywhere strictly stationary and ergodic, which is not common in either the literature on integer-valued time series models or real-valued random coefficient autoregressive models. Therefore, RMINAR model parameters are estimated using a four-stage weighted least squares estimator, with consistency and asymptotic normality established everywhere in the parameter space. Finally, the performance of the new RMINAR models is illustrated with simulated and empirical examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108323"},"PeriodicalIF":1.6,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pure error REML for analyzing data from multi-stratum designs 用于分析多层设计数据的纯误差REML
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-25 DOI: 10.1016/j.csda.2025.108322
Steven G. Gilmour , Peter Goos , Heiko Großmann
Since the dawn of response surface methodology, it has been recommended that designs include replicate points, so that pure error estimates of variance can be obtained and used to provide reliable estimated standard errors of the effects of factors. In designs with more than one stratum, such as split-plot and split-split-plot designs, it is less obvious how pure error estimates of the variance components should be obtained, and no pure error estimates are given by the popular residual maximum likelihood (REML) method of estimation. A method of pure error REML estimation of the variance components, using the full treatment model, is obtained by treating each combination of factor levels as a discrete treatment. This method is easy to implement using standard software and improved estimated standard errors of the fixed effects estimates can be obtained by applying the Kenward-Roger correction based on the pure error REML estimates. The new method is illustrated using several data sets and the performance of pure error REML is compared with the standard REML method. The results are comparable when the assumed response surface model is correct, but the new method is considerably more robust in the case of model misspecification.
自响应面方法学出现以来,建议设计包括重复点,以便获得方差的纯误差估计,并用于提供可靠的因素影响的估计标准误差。在具有多个地层的设计中,如分裂图和分裂-分裂图设计,如何获得方差分量的纯误差估计不太明显,并且流行的残差最大似然(REML)估计方法没有给出纯误差估计。通过将每个因子水平组合作为离散处理,获得了使用完整处理模型的方差分量的纯误差REML估计方法。该方法易于使用标准软件实现,在纯误差REML估计的基础上应用Kenward-Roger校正,可以得到改进的固定效应估计的估计标准误差。用几个数据集说明了新方法,并将纯误差REML方法与标准REML方法的性能进行了比较。当假设的响应面模型正确时,结果是相当的,但在模型不规范的情况下,新方法的鲁棒性要强得多。
{"title":"Pure error REML for analyzing data from multi-stratum designs","authors":"Steven G. Gilmour ,&nbsp;Peter Goos ,&nbsp;Heiko Großmann","doi":"10.1016/j.csda.2025.108322","DOIUrl":"10.1016/j.csda.2025.108322","url":null,"abstract":"<div><div>Since the dawn of response surface methodology, it has been recommended that designs include replicate points, so that pure error estimates of variance can be obtained and used to provide reliable estimated standard errors of the effects of factors. In designs with more than one stratum, such as split-plot and split-split-plot designs, it is less obvious how pure error estimates of the variance components should be obtained, and no pure error estimates are given by the popular residual maximum likelihood (REML) method of estimation. A method of pure error REML estimation of the variance components, using the full treatment model, is obtained by treating each combination of factor levels as a discrete treatment. This method is easy to implement using standard software and improved estimated standard errors of the fixed effects estimates can be obtained by applying the Kenward-Roger correction based on the pure error REML estimates. The new method is illustrated using several data sets and the performance of pure error REML is compared with the standard REML method. The results are comparable when the assumed response surface model is correct, but the new method is considerably more robust in the case of model misspecification.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108322"},"PeriodicalIF":1.6,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1