首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Estimation of semiparametric probit model based on case-cohort interval-censored failure time data 基于病例队列间隔截尾失效时间数据的半参数概率模型估计
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-10 DOI: 10.1016/j.csda.2025.108266
Mingyue Du, Ricong Zeng
The estimation of semiparametric probit model is discussed for the situation where one observes interval-censored failure time data arising from case-cohort studies. The probit model has recently attracted some attention for regression analysis of failure time data partly due to the popularity of the normal distribution and its similarity to linear models. Although some methods have been developed in the literature for its estimation, it does not seem to exist an established approach for the situation of case-cohort interval-censored data. To address this, a pseudo-maximum likelihood method is proposed and furthermore, an EM algorithm is developed for its implementation. The resulting estimators of regression parameters are shown to be consistent and asymptotically follow the normal distribution. To assess the empirical performance of the proposed method, a simulation study is conducted and indicates that it works well in practical situations. In addition, it is applied to a set of real data arising from an AIDS clinical trial that motivated this study.
针对病例队列研究中出现的间隔截尾失效时间数据,讨论了半参数概率模型的估计问题。probit模型近年来在故障时间数据的回归分析中引起了一些关注,部分原因是由于正态分布的普及及其与线性模型的相似性。虽然文献中已经开发了一些方法来估计它,但对于病例队列间隔审查数据的情况,似乎没有一种既定的方法。为了解决这个问题,提出了伪极大似然方法,并进一步开发了一种EM算法来实现它。结果表明,回归参数的估计量是一致的,并且渐近地服从正态分布。为了评估该方法的经验性能,进行了仿真研究,并表明该方法在实际情况下效果良好。此外,它被应用于一组来自艾滋病临床试验的真实数据,这些临床试验激发了本研究。
{"title":"Estimation of semiparametric probit model based on case-cohort interval-censored failure time data","authors":"Mingyue Du,&nbsp;Ricong Zeng","doi":"10.1016/j.csda.2025.108266","DOIUrl":"10.1016/j.csda.2025.108266","url":null,"abstract":"<div><div>The estimation of semiparametric probit model is discussed for the situation where one observes interval-censored failure time data arising from case-cohort studies. The probit model has recently attracted some attention for regression analysis of failure time data partly due to the popularity of the normal distribution and its similarity to linear models. Although some methods have been developed in the literature for its estimation, it does not seem to exist an established approach for the situation of case-cohort interval-censored data. To address this, a pseudo-maximum likelihood method is proposed and furthermore, an EM algorithm is developed for its implementation. The resulting estimators of regression parameters are shown to be consistent and asymptotically follow the normal distribution. To assess the empirical performance of the proposed method, a simulation study is conducted and indicates that it works well in practical situations. In addition, it is applied to a set of real data arising from an AIDS clinical trial that motivated this study.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108266"},"PeriodicalIF":1.6,"publicationDate":"2025-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144886727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating a smooth covariance for functional data 估计函数数据的平滑协方差
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-06 DOI: 10.1016/j.csda.2025.108255
Uche Mbaka , James Owen Ramsay , Michelle Carey
Functional data analysis frequently involves estimating a smooth covariance function based on observed data. This estimation is essential for understanding interactions among functions and constitutes a fundamental aspect of numerous advanced methodologies, including functional principal component analysis. Two approaches for estimating smooth covariance functions in the presence of measurement errors are introduced. The first method employs a low-rank approximation of the covariance matrix, while the second ensures positive definiteness via a Cholesky decomposition. Both approaches employ the use of penalized regression to produce smooth covariance estimates and have been validated through comprehensive simulation studies. The practical application of these methods is demonstrated through the examination of average weekly milk yields in dairy cows as well as egg-laying patterns of Mediterranean fruit flies.
函数数据分析经常涉及基于观测数据估计平滑协方差函数。这种评估对于理解功能之间的相互作用是必不可少的,并且构成了许多高级方法的基本方面,包括功能主成分分析。介绍了在存在测量误差的情况下估计光滑协方差函数的两种方法。第一种方法采用协方差矩阵的低秩近似,而第二种方法通过Cholesky分解确保正确定性。这两种方法都使用惩罚回归来产生平滑的协方差估计,并通过全面的模拟研究进行了验证。这些方法的实际应用是通过检查奶牛的平均每周产奶量以及地中海果蝇的产卵模式来证明的。
{"title":"Estimating a smooth covariance for functional data","authors":"Uche Mbaka ,&nbsp;James Owen Ramsay ,&nbsp;Michelle Carey","doi":"10.1016/j.csda.2025.108255","DOIUrl":"10.1016/j.csda.2025.108255","url":null,"abstract":"<div><div>Functional data analysis frequently involves estimating a smooth covariance function based on observed data. This estimation is essential for understanding interactions among functions and constitutes a fundamental aspect of numerous advanced methodologies, including functional principal component analysis. Two approaches for estimating smooth covariance functions in the presence of measurement errors are introduced. The first method employs a low-rank approximation of the covariance matrix, while the second ensures positive definiteness via a Cholesky decomposition. Both approaches employ the use of penalized regression to produce smooth covariance estimates and have been validated through comprehensive simulation studies. The practical application of these methods is demonstrated through the examination of average weekly milk yields in dairy cows as well as egg-laying patterns of Mediterranean fruit flies.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108255"},"PeriodicalIF":1.6,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144886725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable selection in AUC-optimizing classification auc优化分类中的变量选择
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-05 DOI: 10.1016/j.csda.2025.108256
Hyungwoo Kim , Seung Jun Shin
Optimizing the receiver operating characteristic (ROC) curve is a popular way to evaluate a binary classifier under imbalanced scenarios frequently encountered in practice. A practical approach to constructing a linear binary classifier is presented by simultaneously optimizing the area under the ROC curve (AUC) and selecting informative variables in high dimensions. In particular, the smoothly clipped absolute deviation (SCAD) penalty is employed, and its oracle property is established, which enables the development of a consistent BIC-type information criterion that greatly facilitates the tuning procedure. Both simulated and real data analyses demonstrate the promising performance of the proposed method in terms of AUC optimization and variable selection.
优化接收者工作特征(ROC)曲线是在实践中经常遇到的不平衡场景下评估二值分类器的常用方法。提出了一种构建线性二元分类器的实用方法,即同时优化ROC曲线下面积(AUC)和选择高维信息变量。特别地,采用了平滑裁剪绝对偏差(SCAD)惩罚,并建立了其oracle属性,从而能够开发一致的bic类型信息标准,大大简化了调优过程。仿真和实际数据分析均证明了该方法在AUC优化和变量选择方面具有良好的性能。
{"title":"Variable selection in AUC-optimizing classification","authors":"Hyungwoo Kim ,&nbsp;Seung Jun Shin","doi":"10.1016/j.csda.2025.108256","DOIUrl":"10.1016/j.csda.2025.108256","url":null,"abstract":"<div><div>Optimizing the receiver operating characteristic (ROC) curve is a popular way to evaluate a binary classifier under imbalanced scenarios frequently encountered in practice. A practical approach to constructing a linear binary classifier is presented by simultaneously optimizing the area under the ROC curve (AUC) and selecting informative variables in high dimensions. In particular, the smoothly clipped absolute deviation (SCAD) penalty is employed, and its oracle property is established, which enables the development of a consistent BIC-type information criterion that greatly facilitates the tuning procedure. Both simulated and real data analyses demonstrate the promising performance of the proposed method in terms of AUC optimization and variable selection.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108256"},"PeriodicalIF":1.6,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144828904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random effects misspecification and its consequences for prediction in generalized linear mixed models 广义线性混合模型中的随机效应、错配及其预测后果
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-29 DOI: 10.1016/j.csda.2025.108254
Quan Vu , Francis K.C. Hui , Samuel Muller , A.H. Welsh
When fitting generalized linear mixed models, choosing the random effects distribution is an important decision. As random effects are unobserved, misspecification of their distribution is a real possibility. Thus, the consequences of random effects misspecification for point prediction and prediction inference of random effects in generalized linear mixed models need to be investigated. A combination of theory, simulation, and a real application is used to explore the effect of using the common normality assumption for the random effects distribution when the correct specification is a mixture of normal distributions, focusing on the impacts on point prediction, mean squared prediction errors, and prediction intervals. Results show that the level of shrinkage for the predicted random effects can differ greatly under the two random effect distributions, and so is susceptible to misspecification. Also, the unconditional mean squared prediction errors for the random effects are almost always larger under the misspecified normal random effects distribution, while results for the mean squared prediction errors conditional on the random effects are more complicated but remain generally larger under the misspecified distribution (especially when the true random effect is close to the mean of one of the component distributions in the true mixture distribution). Results for prediction intervals indicate that the overall coverage probability is, in contrast, not greatly impacted by misspecification. It is concluded that misspecifying the random effects distribution can affect prediction of random effects, and greater caution is recommended when adopting the normality assumption in generalized linear mixed models.
在拟合广义线性混合模型时,选择随机效应分布是一个重要决策。由于随机效应是无法观察到的,因此对其分布的错误描述是很有可能的。因此,需要研究广义线性混合模型中随机效应错配对点预测和随机效应预测推理的影响。本文采用理论、模拟和实际应用相结合的方法,探讨了当正确的规范是正态分布的混合时,对随机效应分布使用普通正态假设的效果,重点关注对点预测、均方预测误差和预测区间的影响。结果表明,在两种随机效应分布下,预测的随机效应收缩水平会有很大差异,因此容易出现误规范。此外,在错误指定的正态随机效应分布下,随机效应的无条件均方预测误差几乎总是较大,而在错误指定的分布下,随机效应条件下的均方预测误差结果更复杂,但通常仍然较大(特别是当真实随机效应接近真实混合分布中某个分量分布的平均值时)。相反,预测区间的结果表明,总体覆盖概率不受规格错误的影响。结果表明,随机效应分布的指定不当会影响随机效应的预测,建议在广义线性混合模型中采用正态性假设时要更加谨慎。
{"title":"Random effects misspecification and its consequences for prediction in generalized linear mixed models","authors":"Quan Vu ,&nbsp;Francis K.C. Hui ,&nbsp;Samuel Muller ,&nbsp;A.H. Welsh","doi":"10.1016/j.csda.2025.108254","DOIUrl":"10.1016/j.csda.2025.108254","url":null,"abstract":"<div><div>When fitting generalized linear mixed models, choosing the random effects distribution is an important decision. As random effects are unobserved, misspecification of their distribution is a real possibility. Thus, the consequences of random effects misspecification for point prediction and prediction inference of random effects in generalized linear mixed models need to be investigated. A combination of theory, simulation, and a real application is used to explore the effect of using the common normality assumption for the random effects distribution when the correct specification is a mixture of normal distributions, focusing on the impacts on point prediction, mean squared prediction errors, and prediction intervals. Results show that the level of shrinkage for the predicted random effects can differ greatly under the two random effect distributions, and so is susceptible to misspecification. Also, the unconditional mean squared prediction errors for the random effects are almost always larger under the misspecified normal random effects distribution, while results for the mean squared prediction errors conditional on the random effects are more complicated but remain generally larger under the misspecified distribution (especially when the true random effect is close to the mean of one of the component distributions in the true mixture distribution). Results for prediction intervals indicate that the overall coverage probability is, in contrast, not greatly impacted by misspecification. It is concluded that misspecifying the random effects distribution can affect prediction of random effects, and greater caution is recommended when adopting the normality assumption in generalized linear mixed models.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108254"},"PeriodicalIF":1.6,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144738685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian optimization sequential surrogate (BOSS) algorithm: Fast Bayesian inference for a broad class of Bayesian hierarchical models 贝叶斯优化顺序代理(BOSS)算法:针对广泛的贝叶斯层次模型的快速贝叶斯推理
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-23 DOI: 10.1016/j.csda.2025.108253
Dayi Li , Ziang Zhang
Approximate Bayesian inference based on Laplace approximation and quadrature has become increasingly popular for its efficiency in fitting latent Gaussian models (LGM). However, many useful models can only be fitted as LGMs if some conditioning parameters are fixed. Such models are termed conditional LGMs, with examples including change-point detection, non-linear regression, and many others. Existing methods for fitting conditional LGMs rely on grid search or sampling-based approaches to explore the posterior density of the conditioning parameters; both require a large number of evaluations of the unnormalized posterior density of the conditioning parameters. Since each evaluation requires fitting a separate LGM, these methods become computationally prohibitive beyond simple scenarios. In this work, the Bayesian Optimization Sequential Surrogate (BOSS) algorithm is introduced, which combines Bayesian optimization with approximate Bayesian inference methods to significantly reduce the computational resources required for fitting conditional LGMs. With orders of magnitude fewer evaluations than those required by the existing methods, BOSS efficiently generates sequential design points that capture the majority of the posterior mass of the conditioning parameters and subsequently yields an accurate surrogate posterior distribution that can be easily normalized. The efficiency, accuracy, and practical utility of BOSS are demonstrated through extensive simulation studies and real-world applications in epidemiology, environmental sciences, and astrophysics.
基于拉普拉斯近似和正交的近似贝叶斯推理在拟合潜在高斯模型(LGM)方面的效率越来越高。然而,许多有用的模型只有在某些条件参数固定的情况下才能拟合为lgm。这样的模型被称为条件lgm,其示例包括变化点检测、非线性回归等。现有的拟合条件LGMs的方法依赖于网格搜索或基于抽样的方法来探索条件参数的后验密度;两者都需要对条件参数的非归一化后验密度进行大量的评估。由于每次求值都需要拟合一个单独的LGM,因此这些方法在计算上超出了简单场景的限制。本文介绍了贝叶斯优化序列代理(BOSS)算法,该算法将贝叶斯优化与近似贝叶斯推理方法相结合,大大减少了拟合条件lgm所需的计算资源。与现有方法相比,BOSS的评估次数少了几个数量级,有效地生成了序列设计点,这些设计点捕获了大部分条件反射参数的后验质量,随后产生了一个准确的替代后验分布,可以很容易地归一化。通过广泛的模拟研究和在流行病学、环境科学和天体物理学中的实际应用,证明了BOSS的效率、准确性和实用性。
{"title":"Bayesian optimization sequential surrogate (BOSS) algorithm: Fast Bayesian inference for a broad class of Bayesian hierarchical models","authors":"Dayi Li ,&nbsp;Ziang Zhang","doi":"10.1016/j.csda.2025.108253","DOIUrl":"10.1016/j.csda.2025.108253","url":null,"abstract":"<div><div>Approximate Bayesian inference based on Laplace approximation and quadrature has become increasingly popular for its efficiency in fitting latent Gaussian models (LGM). However, many useful models can only be fitted as LGMs if some conditioning parameters are fixed. Such models are termed conditional LGMs, with examples including change-point detection, non-linear regression, and many others. Existing methods for fitting conditional LGMs rely on grid search or sampling-based approaches to explore the posterior density of the conditioning parameters; both require a large number of evaluations of the unnormalized posterior density of the conditioning parameters. Since each evaluation requires fitting a separate LGM, these methods become computationally prohibitive beyond simple scenarios. In this work, the Bayesian Optimization Sequential Surrogate (BOSS) algorithm is introduced, which combines Bayesian optimization with approximate Bayesian inference methods to significantly reduce the computational resources required for fitting conditional LGMs. With orders of magnitude fewer evaluations than those required by the existing methods, BOSS efficiently generates sequential design points that capture the majority of the posterior mass of the conditioning parameters and subsequently yields an accurate surrogate posterior distribution that can be easily normalized. The efficiency, accuracy, and practical utility of BOSS are demonstrated through extensive simulation studies and real-world applications in epidemiology, environmental sciences, and astrophysics.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108253"},"PeriodicalIF":1.5,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GMM estimation of fixed effects partially linear additive SAR model with space-time correlated disturbances 具有时空相关扰动的部分线性可加SAR模型的固定效应GMM估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-22 DOI: 10.1016/j.csda.2025.108252
Bogui Li , Jianbao Chen
In order to study the ubiquitous space-time panel data in real world, a fixed effects partially linear additive spatial autoregressive (SAR) model with space-time correlated disturbances is proposed. Compared to the linear panel model with space-time correlated disturbances, it can simultaneously capture substantial spatial dependence of response, linearity and nonlinearity between response and regressors, spatial and serial correlations of disturbances, and avoid “curse of dimensionality” of nonparametric regression. By using B-splines to fit additive components and constructing linear and quadratic moment conditions which incorporate information in disturbances, the generalized method of moments (GMM) estimators of unknown parameters and additive components are obtained. Under certain regularity assumptions, it is proved that the GMM estimators are consistent and asymptotically normal. Furthermore, the asymptotically efficient best GMM estimators under normality are derived. Monte Carlo simulation and empirical analysis illustrate that the developed estimation method has good finite sample performance and application prospects.
为了研究现实世界中普遍存在的时空面板数据,提出了一种具有时空相关扰动的固定效应部分线性加性空间自回归模型。与具有时空相关扰动的线性面板模型相比,该模型能够同时捕捉到响应的空间依赖性、响应与回归量之间的线性和非线性、扰动的空间和序列相关性,避免了非参数回归的“维数诅咒”。利用b样条拟合加性分量,构造包含扰动信息的线性和二次矩条件,得到了未知参数和加性分量的广义矩估计方法。在一定的正则性假设下,证明了GMM估计量是一致且渐近正态的。进一步,导出了正态下渐近有效的最优GMM估计量。蒙特卡罗仿真和实证分析表明,该估计方法具有良好的有限样本性能和应用前景。
{"title":"GMM estimation of fixed effects partially linear additive SAR model with space-time correlated disturbances","authors":"Bogui Li ,&nbsp;Jianbao Chen","doi":"10.1016/j.csda.2025.108252","DOIUrl":"10.1016/j.csda.2025.108252","url":null,"abstract":"<div><div>In order to study the ubiquitous space-time panel data in real world, a fixed effects partially linear additive spatial autoregressive (SAR) model with space-time correlated disturbances is proposed. Compared to the linear panel model with space-time correlated disturbances, it can simultaneously capture substantial spatial dependence of response, linearity and nonlinearity between response and regressors, spatial and serial correlations of disturbances, and avoid “curse of dimensionality” of nonparametric regression. By using B-splines to fit additive components and constructing linear and quadratic moment conditions which incorporate information in disturbances, the generalized method of moments (GMM) estimators of unknown parameters and additive components are obtained. Under certain regularity assumptions, it is proved that the GMM estimators are consistent and asymptotically normal. Furthermore, the asymptotically efficient best GMM estimators under normality are derived. Monte Carlo simulation and empirical analysis illustrate that the developed estimation method has good finite sample performance and application prospects.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108252"},"PeriodicalIF":1.5,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring the dynamics of quasi-reaction systems via nonlinear local mean-field approximations 用非线性局部平均场近似推断准反应系统的动力学
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-22 DOI: 10.1016/j.csda.2025.108251
Matteo Framba , Veronica Vinciotti , Ernst C. Wit
Parameter estimation of kinetic rates in stochastic quasi-reaction systems can be challenging, particularly when the time gap between consecutive measurements is large. Local linear approximation approaches account for the stochasticity in the system but fail to capture the intrinsically nonlinear nature of the mean dynamics of the process. Moreover, the mean dynamics of a quasi-reaction system can be described by a system of ODEs, which have an explicit solution only for simple unitary systems. An approximate analytical solution is derived for generic quasi-reaction systems via a first-order Taylor approximation of the hazard rate. This allows a nonlinear forward prediction of the future dynamics given the current state of the system. Predictions and corresponding observations are embedded in a nonlinear least-squares approach for parameter estimation. The performance of the algorithm is compared to existing methods via a simulation study. Besides the generality of the approach in the specification of the quasi-reaction system and the gains in computational efficiency, the results show an improvement in the kinetic rate estimation, particularly for data observed at large time intervals. Additionally, the availability of an explicit solution makes the method robust to stiffness, which is often present in biological systems. Application to Rhesus Macaque data illustrates the use of the method in the study of cell differentiation.
随机准反应系统中动力学速率的参数估计具有挑战性,特别是当连续测量之间的时间间隔较大时。局部线性逼近方法解释了系统的随机性,但未能捕捉到过程平均动力学的内在非线性性质。此外,准反应系统的平均动力学可以用ode系统来描述,而ode系统只有对简单酉系统才有显式解。通过危险率的一阶泰勒近似,导出了一般准反应系统的近似解析解。这允许在给定系统当前状态下对未来动态进行非线性前向预测。预测和相应的观测嵌入在参数估计的非线性最小二乘方法中。通过仿真研究,将该算法的性能与现有方法进行了比较。结果表明,该方法在准反应体系的描述中具有通用性,计算效率有所提高,在动力学速率估计方面也有改进,特别是在大时间间隔观测数据时。此外,显式解的可用性使该方法对刚度具有鲁棒性,这通常存在于生物系统中。恒河猴数据的应用说明了该方法在细胞分化研究中的应用。
{"title":"Inferring the dynamics of quasi-reaction systems via nonlinear local mean-field approximations","authors":"Matteo Framba ,&nbsp;Veronica Vinciotti ,&nbsp;Ernst C. Wit","doi":"10.1016/j.csda.2025.108251","DOIUrl":"10.1016/j.csda.2025.108251","url":null,"abstract":"<div><div>Parameter estimation of kinetic rates in stochastic quasi-reaction systems can be challenging, particularly when the time gap between consecutive measurements is large. Local linear approximation approaches account for the stochasticity in the system but fail to capture the intrinsically nonlinear nature of the mean dynamics of the process. Moreover, the mean dynamics of a quasi-reaction system can be described by a system of ODEs, which have an explicit solution only for simple unitary systems. An approximate analytical solution is derived for generic quasi-reaction systems via a first-order Taylor approximation of the hazard rate. This allows a nonlinear forward prediction of the future dynamics given the current state of the system. Predictions and corresponding observations are embedded in a nonlinear least-squares approach for parameter estimation. The performance of the algorithm is compared to existing methods via a simulation study. Besides the generality of the approach in the specification of the quasi-reaction system and the gains in computational efficiency, the results show an improvement in the kinetic rate estimation, particularly for data observed at large time intervals. Additionally, the availability of an explicit solution makes the method robust to stiffness, which is often present in biological systems. Application to Rhesus Macaque data illustrates the use of the method in the study of cell differentiation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108251"},"PeriodicalIF":1.5,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample-specific cooperative learning integrating heterogeneous radiomics and pathomics data 样本特异性合作学习整合异质放射组学和病理数据
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-21 DOI: 10.1016/j.csda.2025.108250
Shih-Ting Huang , Graham A. Colditz , Shu Jiang
Multi-omics analysis offers unparalleled insights into the interlinked molecular interactions that govern the underlying biological processes. In the era of big data, driven by the emergence of high-throughput technologies, it is possible to gain a more comprehensive and detailed understanding of complex systems. Nevertheless, the challenges lie in developing methods to effectively integrate and analyze this wealth of data. This challenge is even more apparent when the type of -omics data (e.g., pathomics) lacks pixel-to-pixel or region-to-region correspondence across the population. A novel sample-specific cooperative learning framework is introduced, designed to adaptively manage diverse multi-omics data types, even when there is no direct correspondence between regions. The proposed framework is defined for both continuous and categorical outcomes, with theoretical guarantees based on finite samples. Model performance is demonstrated and compared with existing methods using real-world datasets involving proteomics and metabolomics, and radiomics and pathomics.
多组学分析为控制潜在生物过程的相互联系的分子相互作用提供了无与伦比的见解。在大数据时代,在高通量技术的推动下,对复杂系统有了更全面、更详细的了解。然而,挑战在于开发有效整合和分析这些丰富数据的方法。当组学数据类型(如病状)在人群中缺乏像素到像素或区域到区域的对应关系时,这一挑战更加明显。引入了一种新的样本特定合作学习框架,旨在自适应地管理不同的多组学数据类型,即使在区域之间没有直接对应的情况下。所提出的框架是为连续和分类结果定义的,具有基于有限样本的理论保证。使用真实世界的数据集,包括蛋白质组学和代谢组学、放射组学和病理学,展示了模型的性能,并与现有方法进行了比较。
{"title":"Sample-specific cooperative learning integrating heterogeneous radiomics and pathomics data","authors":"Shih-Ting Huang ,&nbsp;Graham A. Colditz ,&nbsp;Shu Jiang","doi":"10.1016/j.csda.2025.108250","DOIUrl":"10.1016/j.csda.2025.108250","url":null,"abstract":"<div><div>Multi-omics analysis offers unparalleled insights into the interlinked molecular interactions that govern the underlying biological processes. In the era of big data, driven by the emergence of high-throughput technologies, it is possible to gain a more comprehensive and detailed understanding of complex systems. Nevertheless, the challenges lie in developing methods to effectively integrate and analyze this wealth of data. This challenge is even more apparent when the type of -omics data (e.g., pathomics) lacks pixel-to-pixel or region-to-region correspondence across the population. A novel sample-specific cooperative learning framework is introduced, designed to adaptively manage diverse multi-omics data types, even when there is no direct correspondence between regions. The proposed framework is defined for both continuous and categorical outcomes, with theoretical guarantees based on finite samples. Model performance is demonstrated and compared with existing methods using real-world datasets involving proteomics and metabolomics, and radiomics and pathomics.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108250"},"PeriodicalIF":1.5,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting interaction tree stumps for modeling interactions 增强交互树桩以建模交互
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-16 DOI: 10.1016/j.csda.2025.108247
Michael Lau , Tamara Schikowski , Holger Schwender
Incorporating interaction effects is essential for accurately modeling complex underlying relationships in many applications. Often, not only strong predictive performance is desired, but also the interpretability of the resulting model. This need is evident in areas such as epidemiology, in which uncovering the interplay of biological mechanisms is critical for understanding complex diseases. Classical linear models, frequently used for constructing genetic risk scores, fail to capture interaction effects autonomously, while modern machine learning methods such as gradient boosting often produce black-box models that lack interpretability. Existing linear interaction models are largely limited to consider two-way interactions. To address these limitations, a novel statistical learning method, BITS (Boosting Interaction Tree Stumps), is introduced to construct linear models while autonomously detecting and incorporating interaction effects. BITS uses gradient boosting on interaction tree stumps, i.e., decision trees with a single split, where in BITS this split can possibly occur on an interaction term. A branch-and-bound approach is employed in BITS to discard weakly predictive terms. For high-dimensional data, a hybrid search strategy combining greedy and exhaustive approaches is proposed. Regularization techniques are integrated to prevent overfitting and the inclusion of spurious interaction effects. Simulation studies and real data applications demonstrate that BITS produces interpretable models with strong predictive performance. Moreover, in the simulation study, BITS primarily identifies truly influential terms.
在许多应用程序中,结合交互效果对于精确地建模复杂的潜在关系是必不可少的。通常,不仅需要强大的预测性能,还需要结果模型的可解释性。这种需求在流行病学等领域是显而易见的,在这些领域,揭示生物机制的相互作用对于理解复杂疾病至关重要。经典的线性模型,经常用于构建遗传风险评分,不能自主地捕获相互作用的影响,而现代机器学习方法,如梯度增强,经常产生缺乏可解释性的黑箱模型。现有的线性相互作用模型在很大程度上局限于考虑双向相互作用。为了解决这些限制,引入了一种新的统计学习方法BITS (Boosting Interaction Tree Stumps)来构建线性模型,同时自主检测和整合交互效应。BITS在交互树桩上使用梯度增强,即具有单个分裂的决策树,在BITS中,这种分裂可能发生在交互项上。在BITS中采用分支定界方法来丢弃弱预测项。针对高维数据,提出了一种贪婪和穷举相结合的混合搜索策略。正则化技术集成,以防止过度拟合和包含虚假的相互作用的影响。仿真研究和实际数据应用表明,BITS产生的可解释模型具有较强的预测性能。此外,在模拟研究中,BITS主要识别真正有影响力的术语。
{"title":"Boosting interaction tree stumps for modeling interactions","authors":"Michael Lau ,&nbsp;Tamara Schikowski ,&nbsp;Holger Schwender","doi":"10.1016/j.csda.2025.108247","DOIUrl":"10.1016/j.csda.2025.108247","url":null,"abstract":"<div><div>Incorporating interaction effects is essential for accurately modeling complex underlying relationships in many applications. Often, not only strong predictive performance is desired, but also the interpretability of the resulting model. This need is evident in areas such as epidemiology, in which uncovering the interplay of biological mechanisms is critical for understanding complex diseases. Classical linear models, frequently used for constructing genetic risk scores, fail to capture interaction effects autonomously, while modern machine learning methods such as gradient boosting often produce black-box models that lack interpretability. Existing linear interaction models are largely limited to consider two-way interactions. To address these limitations, a novel statistical learning method, BITS (Boosting Interaction Tree Stumps), is introduced to construct linear models while autonomously detecting and incorporating interaction effects. BITS uses gradient boosting on interaction tree stumps, i.e., decision trees with a single split, where in BITS this split can possibly occur on an interaction term. A branch-and-bound approach is employed in BITS to discard weakly predictive terms. For high-dimensional data, a hybrid search strategy combining greedy and exhaustive approaches is proposed. Regularization techniques are integrated to prevent overfitting and the inclusion of spurious interaction effects. Simulation studies and real data applications demonstrate that BITS produces interpretable models with strong predictive performance. Moreover, in the simulation study, BITS primarily identifies truly influential terms.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108247"},"PeriodicalIF":1.5,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Jeffreys's cardioid distribution 杰弗里斯的心脏分布
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-16 DOI: 10.1016/j.csda.2025.108248
Arthur Pewsey
The cardioid distribution, despite being one of the fundamental models for circular data, has received limited attention both methodologically and in terms of its implementation in R. To redress these shortcomings, published results on the model are summarized, corrected and extended, and the scope and limitations of the existing support for the model in R identified. A thorough investigation into the performance of trigonometric moment and maximum likelihood based approaches to point and interval estimation of the model's location and concentration parameters is presented, and goodness-of-fit techniques outlined. A suite of reliable R functions is provided for the model's practical application. The application of the proposed inferential methods and R functions is illustrated by an analysis of palaeocurrent cross-bed azimuths.
尽管心型分布是圆形数据的基本模型之一,但在方法上和在R中的实施方面都受到有限的关注。为了纠正这些缺点,对该模型的已发表结果进行了总结、修正和扩展,并确定了R中现有支持该模型的范围和局限性。深入研究了三角矩和基于最大似然的方法对模型的位置和浓度参数的点和区间估计的性能,并概述了拟合优度技术。为模型的实际应用提供了一套可靠的R函数。通过古水流交叉层方位角的分析,说明了所提出的推理方法和R函数的应用。
{"title":"On Jeffreys's cardioid distribution","authors":"Arthur Pewsey","doi":"10.1016/j.csda.2025.108248","DOIUrl":"10.1016/j.csda.2025.108248","url":null,"abstract":"<div><div>The cardioid distribution, despite being one of the fundamental models for circular data, has received limited attention both methodologically and in terms of its implementation in R. To redress these shortcomings, published results on the model are summarized, corrected and extended, and the scope and limitations of the existing support for the model in R identified. A thorough investigation into the performance of trigonometric moment and maximum likelihood based approaches to point and interval estimation of the model's location and concentration parameters is presented, and goodness-of-fit techniques outlined. A suite of reliable R functions is provided for the model's practical application. The application of the proposed inferential methods and R functions is illustrated by an analysis of palaeocurrent cross-bed azimuths.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108248"},"PeriodicalIF":1.5,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144656280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1