首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Estimating a smooth covariance for functional data 估计函数数据的平滑协方差
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-01 Epub Date: 2025-08-06 DOI: 10.1016/j.csda.2025.108255
Uche Mbaka , James Owen Ramsay , Michelle Carey
Functional data analysis frequently involves estimating a smooth covariance function based on observed data. This estimation is essential for understanding interactions among functions and constitutes a fundamental aspect of numerous advanced methodologies, including functional principal component analysis. Two approaches for estimating smooth covariance functions in the presence of measurement errors are introduced. The first method employs a low-rank approximation of the covariance matrix, while the second ensures positive definiteness via a Cholesky decomposition. Both approaches employ the use of penalized regression to produce smooth covariance estimates and have been validated through comprehensive simulation studies. The practical application of these methods is demonstrated through the examination of average weekly milk yields in dairy cows as well as egg-laying patterns of Mediterranean fruit flies.
函数数据分析经常涉及基于观测数据估计平滑协方差函数。这种评估对于理解功能之间的相互作用是必不可少的,并且构成了许多高级方法的基本方面,包括功能主成分分析。介绍了在存在测量误差的情况下估计光滑协方差函数的两种方法。第一种方法采用协方差矩阵的低秩近似,而第二种方法通过Cholesky分解确保正确定性。这两种方法都使用惩罚回归来产生平滑的协方差估计,并通过全面的模拟研究进行了验证。这些方法的实际应用是通过检查奶牛的平均每周产奶量以及地中海果蝇的产卵模式来证明的。
{"title":"Estimating a smooth covariance for functional data","authors":"Uche Mbaka ,&nbsp;James Owen Ramsay ,&nbsp;Michelle Carey","doi":"10.1016/j.csda.2025.108255","DOIUrl":"10.1016/j.csda.2025.108255","url":null,"abstract":"<div><div>Functional data analysis frequently involves estimating a smooth covariance function based on observed data. This estimation is essential for understanding interactions among functions and constitutes a fundamental aspect of numerous advanced methodologies, including functional principal component analysis. Two approaches for estimating smooth covariance functions in the presence of measurement errors are introduced. The first method employs a low-rank approximation of the covariance matrix, while the second ensures positive definiteness via a Cholesky decomposition. Both approaches employ the use of penalized regression to produce smooth covariance estimates and have been validated through comprehensive simulation studies. The practical application of these methods is demonstrated through the examination of average weekly milk yields in dairy cows as well as egg-laying patterns of Mediterranean fruit flies.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108255"},"PeriodicalIF":1.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144886725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random effects misspecification and its consequences for prediction in generalized linear mixed models 广义线性混合模型中的随机效应、错配及其预测后果
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-01 Epub Date: 2025-07-29 DOI: 10.1016/j.csda.2025.108254
Quan Vu , Francis K.C. Hui , Samuel Muller , A.H. Welsh
When fitting generalized linear mixed models, choosing the random effects distribution is an important decision. As random effects are unobserved, misspecification of their distribution is a real possibility. Thus, the consequences of random effects misspecification for point prediction and prediction inference of random effects in generalized linear mixed models need to be investigated. A combination of theory, simulation, and a real application is used to explore the effect of using the common normality assumption for the random effects distribution when the correct specification is a mixture of normal distributions, focusing on the impacts on point prediction, mean squared prediction errors, and prediction intervals. Results show that the level of shrinkage for the predicted random effects can differ greatly under the two random effect distributions, and so is susceptible to misspecification. Also, the unconditional mean squared prediction errors for the random effects are almost always larger under the misspecified normal random effects distribution, while results for the mean squared prediction errors conditional on the random effects are more complicated but remain generally larger under the misspecified distribution (especially when the true random effect is close to the mean of one of the component distributions in the true mixture distribution). Results for prediction intervals indicate that the overall coverage probability is, in contrast, not greatly impacted by misspecification. It is concluded that misspecifying the random effects distribution can affect prediction of random effects, and greater caution is recommended when adopting the normality assumption in generalized linear mixed models.
在拟合广义线性混合模型时,选择随机效应分布是一个重要决策。由于随机效应是无法观察到的,因此对其分布的错误描述是很有可能的。因此,需要研究广义线性混合模型中随机效应错配对点预测和随机效应预测推理的影响。本文采用理论、模拟和实际应用相结合的方法,探讨了当正确的规范是正态分布的混合时,对随机效应分布使用普通正态假设的效果,重点关注对点预测、均方预测误差和预测区间的影响。结果表明,在两种随机效应分布下,预测的随机效应收缩水平会有很大差异,因此容易出现误规范。此外,在错误指定的正态随机效应分布下,随机效应的无条件均方预测误差几乎总是较大,而在错误指定的分布下,随机效应条件下的均方预测误差结果更复杂,但通常仍然较大(特别是当真实随机效应接近真实混合分布中某个分量分布的平均值时)。相反,预测区间的结果表明,总体覆盖概率不受规格错误的影响。结果表明,随机效应分布的指定不当会影响随机效应的预测,建议在广义线性混合模型中采用正态性假设时要更加谨慎。
{"title":"Random effects misspecification and its consequences for prediction in generalized linear mixed models","authors":"Quan Vu ,&nbsp;Francis K.C. Hui ,&nbsp;Samuel Muller ,&nbsp;A.H. Welsh","doi":"10.1016/j.csda.2025.108254","DOIUrl":"10.1016/j.csda.2025.108254","url":null,"abstract":"<div><div>When fitting generalized linear mixed models, choosing the random effects distribution is an important decision. As random effects are unobserved, misspecification of their distribution is a real possibility. Thus, the consequences of random effects misspecification for point prediction and prediction inference of random effects in generalized linear mixed models need to be investigated. A combination of theory, simulation, and a real application is used to explore the effect of using the common normality assumption for the random effects distribution when the correct specification is a mixture of normal distributions, focusing on the impacts on point prediction, mean squared prediction errors, and prediction intervals. Results show that the level of shrinkage for the predicted random effects can differ greatly under the two random effect distributions, and so is susceptible to misspecification. Also, the unconditional mean squared prediction errors for the random effects are almost always larger under the misspecified normal random effects distribution, while results for the mean squared prediction errors conditional on the random effects are more complicated but remain generally larger under the misspecified distribution (especially when the true random effect is close to the mean of one of the component distributions in the true mixture distribution). Results for prediction intervals indicate that the overall coverage probability is, in contrast, not greatly impacted by misspecification. It is concluded that misspecifying the random effects distribution can affect prediction of random effects, and greater caution is recommended when adopting the normality assumption in generalized linear mixed models.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108254"},"PeriodicalIF":1.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144738685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical likelihood based Bayesian variable selection 基于经验似然的贝叶斯变量选择
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-01 Epub Date: 2025-08-13 DOI: 10.1016/j.csda.2025.108258
Yichen Cheng , Yichuan Zhao
Empirical likelihood is a popular nonparametric statistical tool that does not require any distributional assumptions. The possibility of conducting variable selection via Bayesian empirical likelihood is studied both theoretically and empirically. Theoretically, it is shown that when the prior distribution satisfies certain mild conditions, the corresponding Bayesian empirical likelihood estimators are posteriorly consistent and variable selection consistent. As special cases, the prior of Bayesian empirical likelihood LASSO and SCAD satisfy such conditions and thus can identify the non-zero elements of the parameters with probability approaching 1. In addition, it is easy to verify that those conditions are met for other widely used priors such as ridge, elastic net and adaptive LASSO. Empirical likelihood depends on a parameter that needs to be obtained by numerically solving a non-linear equation. Thus, there exists no conjugate prior for the posterior distribution, which causes the slow convergence of the MCMC sampling algorithm in some cases. To solve this problem, an approximation distribution is used as the proposal to enhance the acceptance rate and, therefore, facilitate faster computation. The computational results demonstrate quick convergence for the examples used in the paper. Both simulations and real data analyses are performed to illustrate the advantages of the proposed methods.
经验似然是一种流行的非参数统计工具,它不需要任何分布假设。从理论和实证两方面研究了贝叶斯经验似然法进行变量选择的可能性。从理论上表明,当先验分布满足一定温和条件时,相应的贝叶斯经验似然估计量后验一致,变量选择一致。作为特殊情况,贝叶斯经验似然LASSO和SCAD的先验满足这些条件,能够以接近1的概率识别出参数的非零元素。此外,对于山脊、弹性网和自适应LASSO等其他广泛使用的先验算法,也很容易验证这些条件是否满足。经验似然依赖于一个参数,该参数需要通过数值求解非线性方程来获得。因此,后验分布不存在共轭先验,导致MCMC采样算法在某些情况下收敛速度较慢。为了解决这一问题,建议采用近似分布来提高接受率,从而加快计算速度。计算结果表明,本文所用算例具有较快的收敛性。仿真和实际数据分析表明了所提方法的优越性。
{"title":"Empirical likelihood based Bayesian variable selection","authors":"Yichen Cheng ,&nbsp;Yichuan Zhao","doi":"10.1016/j.csda.2025.108258","DOIUrl":"10.1016/j.csda.2025.108258","url":null,"abstract":"<div><div>Empirical likelihood is a popular nonparametric statistical tool that does not require any distributional assumptions. The possibility of conducting variable selection via Bayesian empirical likelihood is studied both theoretically and empirically. Theoretically, it is shown that when the prior distribution satisfies certain mild conditions, the corresponding Bayesian empirical likelihood estimators are posteriorly consistent and variable selection consistent. As special cases, the prior of Bayesian empirical likelihood LASSO and SCAD satisfy such conditions and thus can identify the non-zero elements of the parameters with probability approaching 1. In addition, it is easy to verify that those conditions are met for other widely used priors such as ridge, elastic net and adaptive LASSO. Empirical likelihood depends on a parameter that needs to be obtained by numerically solving a non-linear equation. Thus, there exists no conjugate prior for the posterior distribution, which causes the slow convergence of the MCMC sampling algorithm in some cases. To solve this problem, an approximation distribution is used as the proposal to enhance the acceptance rate and, therefore, facilitate faster computation. The computational results demonstrate quick convergence for the examples used in the paper. Both simulations and real data analyses are performed to illustrate the advantages of the proposed methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108258"},"PeriodicalIF":1.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring the dynamics of quasi-reaction systems via nonlinear local mean-field approximations 用非线性局部平均场近似推断准反应系统的动力学
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-01 Epub Date: 2025-07-22 DOI: 10.1016/j.csda.2025.108251
Matteo Framba , Veronica Vinciotti , Ernst C. Wit
Parameter estimation of kinetic rates in stochastic quasi-reaction systems can be challenging, particularly when the time gap between consecutive measurements is large. Local linear approximation approaches account for the stochasticity in the system but fail to capture the intrinsically nonlinear nature of the mean dynamics of the process. Moreover, the mean dynamics of a quasi-reaction system can be described by a system of ODEs, which have an explicit solution only for simple unitary systems. An approximate analytical solution is derived for generic quasi-reaction systems via a first-order Taylor approximation of the hazard rate. This allows a nonlinear forward prediction of the future dynamics given the current state of the system. Predictions and corresponding observations are embedded in a nonlinear least-squares approach for parameter estimation. The performance of the algorithm is compared to existing methods via a simulation study. Besides the generality of the approach in the specification of the quasi-reaction system and the gains in computational efficiency, the results show an improvement in the kinetic rate estimation, particularly for data observed at large time intervals. Additionally, the availability of an explicit solution makes the method robust to stiffness, which is often present in biological systems. Application to Rhesus Macaque data illustrates the use of the method in the study of cell differentiation.
随机准反应系统中动力学速率的参数估计具有挑战性,特别是当连续测量之间的时间间隔较大时。局部线性逼近方法解释了系统的随机性,但未能捕捉到过程平均动力学的内在非线性性质。此外,准反应系统的平均动力学可以用ode系统来描述,而ode系统只有对简单酉系统才有显式解。通过危险率的一阶泰勒近似,导出了一般准反应系统的近似解析解。这允许在给定系统当前状态下对未来动态进行非线性前向预测。预测和相应的观测嵌入在参数估计的非线性最小二乘方法中。通过仿真研究,将该算法的性能与现有方法进行了比较。结果表明,该方法在准反应体系的描述中具有通用性,计算效率有所提高,在动力学速率估计方面也有改进,特别是在大时间间隔观测数据时。此外,显式解的可用性使该方法对刚度具有鲁棒性,这通常存在于生物系统中。恒河猴数据的应用说明了该方法在细胞分化研究中的应用。
{"title":"Inferring the dynamics of quasi-reaction systems via nonlinear local mean-field approximations","authors":"Matteo Framba ,&nbsp;Veronica Vinciotti ,&nbsp;Ernst C. Wit","doi":"10.1016/j.csda.2025.108251","DOIUrl":"10.1016/j.csda.2025.108251","url":null,"abstract":"<div><div>Parameter estimation of kinetic rates in stochastic quasi-reaction systems can be challenging, particularly when the time gap between consecutive measurements is large. Local linear approximation approaches account for the stochasticity in the system but fail to capture the intrinsically nonlinear nature of the mean dynamics of the process. Moreover, the mean dynamics of a quasi-reaction system can be described by a system of ODEs, which have an explicit solution only for simple unitary systems. An approximate analytical solution is derived for generic quasi-reaction systems via a first-order Taylor approximation of the hazard rate. This allows a nonlinear forward prediction of the future dynamics given the current state of the system. Predictions and corresponding observations are embedded in a nonlinear least-squares approach for parameter estimation. The performance of the algorithm is compared to existing methods via a simulation study. Besides the generality of the approach in the specification of the quasi-reaction system and the gains in computational efficiency, the results show an improvement in the kinetic rate estimation, particularly for data observed at large time intervals. Additionally, the availability of an explicit solution makes the method robust to stiffness, which is often present in biological systems. Application to Rhesus Macaque data illustrates the use of the method in the study of cell differentiation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108251"},"PeriodicalIF":1.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Jeffreys's cardioid distribution 杰弗里斯的心脏分布
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-01 Epub Date: 2025-07-16 DOI: 10.1016/j.csda.2025.108248
Arthur Pewsey
The cardioid distribution, despite being one of the fundamental models for circular data, has received limited attention both methodologically and in terms of its implementation in R. To redress these shortcomings, published results on the model are summarized, corrected and extended, and the scope and limitations of the existing support for the model in R identified. A thorough investigation into the performance of trigonometric moment and maximum likelihood based approaches to point and interval estimation of the model's location and concentration parameters is presented, and goodness-of-fit techniques outlined. A suite of reliable R functions is provided for the model's practical application. The application of the proposed inferential methods and R functions is illustrated by an analysis of palaeocurrent cross-bed azimuths.
尽管心型分布是圆形数据的基本模型之一,但在方法上和在R中的实施方面都受到有限的关注。为了纠正这些缺点,对该模型的已发表结果进行了总结、修正和扩展,并确定了R中现有支持该模型的范围和局限性。深入研究了三角矩和基于最大似然的方法对模型的位置和浓度参数的点和区间估计的性能,并概述了拟合优度技术。为模型的实际应用提供了一套可靠的R函数。通过古水流交叉层方位角的分析,说明了所提出的推理方法和R函数的应用。
{"title":"On Jeffreys's cardioid distribution","authors":"Arthur Pewsey","doi":"10.1016/j.csda.2025.108248","DOIUrl":"10.1016/j.csda.2025.108248","url":null,"abstract":"<div><div>The cardioid distribution, despite being one of the fundamental models for circular data, has received limited attention both methodologically and in terms of its implementation in R. To redress these shortcomings, published results on the model are summarized, corrected and extended, and the scope and limitations of the existing support for the model in R identified. A thorough investigation into the performance of trigonometric moment and maximum likelihood based approaches to point and interval estimation of the model's location and concentration parameters is presented, and goodness-of-fit techniques outlined. A suite of reliable R functions is provided for the model's practical application. The application of the proposed inferential methods and R functions is illustrated by an analysis of palaeocurrent cross-bed azimuths.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108248"},"PeriodicalIF":1.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144656280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable selection in AUC-optimizing classification auc优化分类中的变量选择
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-01 Epub Date: 2025-08-05 DOI: 10.1016/j.csda.2025.108256
Hyungwoo Kim , Seung Jun Shin
Optimizing the receiver operating characteristic (ROC) curve is a popular way to evaluate a binary classifier under imbalanced scenarios frequently encountered in practice. A practical approach to constructing a linear binary classifier is presented by simultaneously optimizing the area under the ROC curve (AUC) and selecting informative variables in high dimensions. In particular, the smoothly clipped absolute deviation (SCAD) penalty is employed, and its oracle property is established, which enables the development of a consistent BIC-type information criterion that greatly facilitates the tuning procedure. Both simulated and real data analyses demonstrate the promising performance of the proposed method in terms of AUC optimization and variable selection.
优化接收者工作特征(ROC)曲线是在实践中经常遇到的不平衡场景下评估二值分类器的常用方法。提出了一种构建线性二元分类器的实用方法,即同时优化ROC曲线下面积(AUC)和选择高维信息变量。特别地,采用了平滑裁剪绝对偏差(SCAD)惩罚,并建立了其oracle属性,从而能够开发一致的bic类型信息标准,大大简化了调优过程。仿真和实际数据分析均证明了该方法在AUC优化和变量选择方面具有良好的性能。
{"title":"Variable selection in AUC-optimizing classification","authors":"Hyungwoo Kim ,&nbsp;Seung Jun Shin","doi":"10.1016/j.csda.2025.108256","DOIUrl":"10.1016/j.csda.2025.108256","url":null,"abstract":"<div><div>Optimizing the receiver operating characteristic (ROC) curve is a popular way to evaluate a binary classifier under imbalanced scenarios frequently encountered in practice. A practical approach to constructing a linear binary classifier is presented by simultaneously optimizing the area under the ROC curve (AUC) and selecting informative variables in high dimensions. In particular, the smoothly clipped absolute deviation (SCAD) penalty is employed, and its oracle property is established, which enables the development of a consistent BIC-type information criterion that greatly facilitates the tuning procedure. Both simulated and real data analyses demonstrate the promising performance of the proposed method in terms of AUC optimization and variable selection.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108256"},"PeriodicalIF":1.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144828904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable selection for spatio-temporal conditionally Poisson point processes 时空条件泊松点过程的变量选择
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-06-27 DOI: 10.1016/j.csda.2025.108238
Achmad Choiruddin , Jonatan A. González , Jorge Mateu , Alwan Fadlurohman , Rasmus Waagepetersen
Spatio-temporal point pattern data are becoming prevalent in many scientific disciplines. We consider a sequence of spatial point processes where each point process is Poisson given the past. We model the conditional first-order intensity function of each point process as a parametric log-linear function of spatial, temporal, and spatio-temporal covariates that may depend on previous point patterns. Dealing with spatio-temporal covariates brings computational and methodological challenges compared to the purely spatial case. We extend regularisation methods for spatial point process variable selection to obtain parsimonious and interpretable models in the considered spatio-temporal case. Using our proposed methodology, we conduct two simulation studies and examine an application to criminal activity in the Kennedy district of Bogota. In the application, we consider a spatio-temporal point pattern data of crime locations and a number of spatial, temporal, and spatio-temporal covariates related to urban places, environmental factors, and further space-time factors. The intensity function of vehicle thefts is estimated, considering other crimes as covariate information. The proposed methodology offers a comprehensive approach for analysing spatio-temporal point pattern crime data, capturing complex relationships between covariates and crime occurrences over space and time.
时空点模式数据在许多科学学科中越来越流行。我们考虑一个空间点过程序列,其中每个点过程都是给定过去的泊松过程。我们将每个点过程的条件一阶强度函数建模为空间、时间和时空协变量的参数对数线性函数,这些协变量可能依赖于先前的点模式。与纯粹的空间情况相比,处理时空协变量带来了计算和方法上的挑战。我们扩展了空间点过程变量选择的正则化方法,以在考虑的时空情况下获得简洁和可解释的模型。使用我们提出的方法,我们进行了两次模拟研究,并检查了波哥大肯尼迪区犯罪活动的应用。在应用程序中,我们考虑了犯罪地点的时空点模式数据以及与城市地点、环境因素和进一步的时空因素相关的一些空间、时间和时空协变量。考虑其他犯罪作为协变量信息,估计了车辆盗窃的强度函数。所提出的方法提供了一种全面的方法来分析时空点模式犯罪数据,捕捉协变量和犯罪事件之间的复杂关系。
{"title":"Variable selection for spatio-temporal conditionally Poisson point processes","authors":"Achmad Choiruddin ,&nbsp;Jonatan A. González ,&nbsp;Jorge Mateu ,&nbsp;Alwan Fadlurohman ,&nbsp;Rasmus Waagepetersen","doi":"10.1016/j.csda.2025.108238","DOIUrl":"10.1016/j.csda.2025.108238","url":null,"abstract":"<div><div>Spatio-temporal point pattern data are becoming prevalent in many scientific disciplines. We consider a sequence of spatial point processes where each point process is Poisson given the past. We model the conditional first-order intensity function of each point process as a parametric log-linear function of spatial, temporal, and spatio-temporal covariates that may depend on previous point patterns. Dealing with spatio-temporal covariates brings computational and methodological challenges compared to the purely spatial case. We extend regularisation methods for spatial point process variable selection to obtain parsimonious and interpretable models in the considered spatio-temporal case. Using our proposed methodology, we conduct two simulation studies and examine an application to criminal activity in the Kennedy district of Bogota. In the application, we consider a spatio-temporal point pattern data of crime locations and a number of spatial, temporal, and spatio-temporal covariates related to urban places, environmental factors, and further space-time factors. The intensity function of vehicle thefts is estimated, considering other crimes as covariate information. The proposed methodology offers a comprehensive approach for analysing spatio-temporal point pattern crime data, capturing complex relationships between covariates and crime occurrences over space and time.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108238"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144535764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New tests for the identity and sphericity of high-dimensional covariance matrices via U-statistics 用u统计量检验高维协方差矩阵的恒等性和球性
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-07-05 DOI: 10.1016/j.csda.2025.108242
Xiaoge Xiong
Two novel test procedures are proposed for the identity and sphericity of covariance matrices in high-dimensional asymptotic frameworks, both constructed via U-statistics. The limiting distributions of these tests are established under null and local alternative hypotheses. Monte Carlo simulation results demonstrate their superiority over several competing methods across various scenarios, with the proposed tests achieving full power against both dense and sparse alternatives. The effectiveness of the proposed tests is further validated through an application to a colon dataset.
提出了两种新的检验方法来检验高维渐近框架中协方差矩阵的恒等性和球性,这两种检验方法都是用u统计量构造的。这些检验的极限分布是在零假设和局部备用假设下建立的。蒙特卡罗仿真结果表明,在各种情况下,该方法优于几种竞争方法,所提出的测试在密集和稀疏替代方案下都能达到全功率。通过对冒号数据集的应用程序进一步验证了所建议测试的有效性。
{"title":"New tests for the identity and sphericity of high-dimensional covariance matrices via U-statistics","authors":"Xiaoge Xiong","doi":"10.1016/j.csda.2025.108242","DOIUrl":"10.1016/j.csda.2025.108242","url":null,"abstract":"<div><div>Two novel test procedures are proposed for the identity and sphericity of covariance matrices in high-dimensional asymptotic frameworks, both constructed via U-statistics. The limiting distributions of these tests are established under null and local alternative hypotheses. Monte Carlo simulation results demonstrate their superiority over several competing methods across various scenarios, with the proposed tests achieving full power against both dense and sparse alternatives. The effectiveness of the proposed tests is further validated through an application to a colon dataset.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108242"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144570845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional response growth curve modeling for longitudinal neuroimaging analysis 纵向神经成像分析的高维反应增长曲线建模
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-07-07 DOI: 10.1016/j.csda.2025.108239
Lu Wang , Xiang Lyu , Lexin Li
There is increasing interest in modeling high-dimensional longitudinal outcomes in applications such as developmental neuroimaging research. Growth curve model offers a useful tool to capture both the mean growth pattern across individuals, as well as the dynamic changes of outcomes over time within each individual. However, when the number of outcomes is large, it becomes challenging and often infeasible to tackle the large covariance matrix of the random effects involved in the model. A high-dimensional response growth curve model, with three novel components, is proposed: a low-rank factor model structure that substantially reduces the number of parameters in the large covariance matrix, a re-parameterization formulation coupled with a sparsity penalty that selects important fixed and random effect terms, and a computational trick that turns the inversion of a large matrix into the inversion of a stack of small matrices and thus considerably speeds up the computation. An efficient expectation-maximization-type estimation algorithm is developed, and the competitive performance of the proposed method is demonstrated through both simulations and a longitudinal study of brain structural connectivity in association with human immunodeficiency virus.
在诸如发育神经成像研究等应用中,对高维纵向结果建模的兴趣越来越大。增长曲线模型提供了一个有用的工具,既可以捕捉个体之间的平均增长模式,也可以捕捉每个个体内部结果随时间的动态变化。然而,当结果数量很大时,处理模型中涉及的随机效应的大协方差矩阵就变得具有挑战性,而且往往是不可行的。提出了一种高维响应增长曲线模型,具有三个新的组成部分:一个低秩因子模型结构,它大大减少了大协方差矩阵中参数的数量;一个再参数化公式,加上选择重要的固定和随机效应项的稀疏性惩罚;一个计算技巧,它将一个大矩阵的反演转化为一堆小矩阵的反演,从而大大加快了计算速度。开发了一种高效的期望最大化型估计算法,并通过模拟和与人类免疫缺陷病毒相关的大脑结构连接的纵向研究证明了所提出方法的竞争性能。
{"title":"High-dimensional response growth curve modeling for longitudinal neuroimaging analysis","authors":"Lu Wang ,&nbsp;Xiang Lyu ,&nbsp;Lexin Li","doi":"10.1016/j.csda.2025.108239","DOIUrl":"10.1016/j.csda.2025.108239","url":null,"abstract":"<div><div>There is increasing interest in modeling high-dimensional longitudinal outcomes in applications such as developmental neuroimaging research. Growth curve model offers a useful tool to capture both the mean growth pattern across individuals, as well as the dynamic changes of outcomes over time within each individual. However, when the number of outcomes is large, it becomes challenging and often infeasible to tackle the large covariance matrix of the random effects involved in the model. A high-dimensional response growth curve model, with three novel components, is proposed: a low-rank factor model structure that substantially reduces the number of parameters in the large covariance matrix, a re-parameterization formulation coupled with a sparsity penalty that selects important fixed and random effect terms, and a computational trick that turns the inversion of a large matrix into the inversion of a stack of small matrices and thus considerably speeds up the computation. An efficient expectation-maximization-type estimation algorithm is developed, and the competitive performance of the proposed method is demonstrated through both simulations and a longitudinal study of brain structural connectivity in association with human immunodeficiency virus.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108239"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144580699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional and banded integer-valued autoregressive processes 高维带整数值自回归过程
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-07-04 DOI: 10.1016/j.csda.2025.108243
Nuo Xu, Kai Yang
The modeling of high-dimensional time series has always been an appealing and challenging problem. The main difficulties of modeling high-dimensional time series lie in the curse of dimensionality and complex cross dependence between adjacent components. To solve these problems for high-dimensional time series of counts, a class of high-dimensional and banded integer-valued autoregressive processes without assuming the innovation's distribution is proposed. A banded thinning structure is constructed to diminish the parameters' dimension. The componentwise conditional least squares and weighted conditional least squares methods are developed to estimate the banded autoregressive coefficient matrices. The bandwidth parameter is identified via a marginal Bayesian information criterion method. Some numerical results are provided to show the good performance of the estimators. Finally, the superiority of the proposed model is shown by an application to an air quality data set of different cities.
高维时间序列的建模一直是一个具有吸引力和挑战性的问题。高维时间序列建模的主要困难在于维度的诅咒和相邻分量之间复杂的交叉依赖。为了解决高维计数时间序列的这些问题,提出了一类不假设创新分布的高维带状整值自回归过程。采用带状减薄结构减小参数尺寸。提出了组合条件最小二乘法和加权条件最小二乘法来估计带状自回归系数矩阵。利用边际贝叶斯信息准则识别带宽参数。数值结果表明了该估计器的良好性能。最后,通过对不同城市空气质量数据集的应用,证明了该模型的优越性。
{"title":"High-dimensional and banded integer-valued autoregressive processes","authors":"Nuo Xu,&nbsp;Kai Yang","doi":"10.1016/j.csda.2025.108243","DOIUrl":"10.1016/j.csda.2025.108243","url":null,"abstract":"<div><div>The modeling of high-dimensional time series has always been an appealing and challenging problem. The main difficulties of modeling high-dimensional time series lie in the curse of dimensionality and complex cross dependence between adjacent components. To solve these problems for high-dimensional time series of counts, a class of high-dimensional and banded integer-valued autoregressive processes without assuming the innovation's distribution is proposed. A banded thinning structure is constructed to diminish the parameters' dimension. The componentwise conditional least squares and weighted conditional least squares methods are developed to estimate the banded autoregressive coefficient matrices. The bandwidth parameter is identified via a marginal Bayesian information criterion method. Some numerical results are provided to show the good performance of the estimators. Finally, the superiority of the proposed model is shown by an application to an air quality data set of different cities.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108243"},"PeriodicalIF":1.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1