首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Sequential hierarchical Bayesian model and particle filter estimation with two-step RJMCMC resampling 顺序层次贝叶斯模型及两步RJMCMC重采样的粒子滤波估计
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-17 DOI: 10.1016/j.csda.2025.108304
Yue Huan , Guoqiang Wang , Hai Xiang Lin
Data assimilation (DA) combines numerical model simulations with observed data to obtain the best possible description of a dynamical system and its uncertainty. Incorrect modeling assumptions can lead to filter divergence, making model identification an important issue in the field of DA. Variations in dynamic model structures can result in differences in parameter dimensions, complicating the resampling step in PFs. To meet this challenge, the Sequential Hierarchical Bayesian Model (SHBM) is proposed in this paper, which integrates the evolution model along with observation model from the DA scheme, and the hierarchical parameter model. A two-step resampling method are also proposed to estimate the SHBM: the first step uses the resampling scheme in the bootstrap filter to resample new particles based on weights, which may produce some duplicate particles; the second step utilizes the Reversible Jump Markov Chain Monte Carlo (RJMCMC) methods to draw new particles from the target distribution. This approach ensures particle diversity, with the first step aiming at avoiding particle degeneracy, and the second step intends to prevent the sample impoverishment. The performance in the Advection Equation example and Lorenz 96 example demonstrates the effectiveness of the proposed method.
数据同化(Data assimilation, DA)将数值模型模拟与观测数据相结合,以获得对动力系统及其不确定性的最佳描述。不正确的建模假设会导致滤波发散,使得模型识别成为数据分析领域的一个重要问题。动态模型结构的变化可能导致参数尺寸的差异,使PFs的重采样步骤复杂化。针对这一挑战,本文提出了序列层次贝叶斯模型(sequence Hierarchical Bayesian Model, SHBM),该模型将数据分析方案的演化模型和观测模型与层次参数模型相结合。提出了一种估计SHBM的两步重采样方法:第一步使用自举滤波器中的重采样方案,根据权重重新采样新粒子,这可能会产生一些重复粒子;第二步利用可逆跳跃马尔可夫链蒙特卡罗(RJMCMC)方法从目标分布中提取新粒子。该方法保证了粒子的多样性,第一步旨在避免粒子退化,第二步旨在防止样本贫困化。平流方程算例和Lorenz 96算例的性能验证了该方法的有效性。
{"title":"Sequential hierarchical Bayesian model and particle filter estimation with two-step RJMCMC resampling","authors":"Yue Huan ,&nbsp;Guoqiang Wang ,&nbsp;Hai Xiang Lin","doi":"10.1016/j.csda.2025.108304","DOIUrl":"10.1016/j.csda.2025.108304","url":null,"abstract":"<div><div>Data assimilation (DA) combines numerical model simulations with observed data to obtain the best possible description of a dynamical system and its uncertainty. Incorrect modeling assumptions can lead to filter divergence, making model identification an important issue in the field of DA. Variations in dynamic model structures can result in differences in parameter dimensions, complicating the resampling step in PFs. To meet this challenge, the Sequential Hierarchical Bayesian Model (SHBM) is proposed in this paper, which integrates the evolution model along with observation model from the DA scheme, and the hierarchical parameter model. A two-step resampling method are also proposed to estimate the SHBM: the first step uses the resampling scheme in the bootstrap filter to resample new particles based on weights, which may produce some duplicate particles; the second step utilizes the Reversible Jump Markov Chain Monte Carlo (RJMCMC) methods to draw new particles from the target distribution. This approach ensures particle diversity, with the first step aiming at avoiding particle degeneracy, and the second step intends to prevent the sample impoverishment. The performance in the Advection Equation example and Lorenz 96 example demonstrates the effectiveness of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108304"},"PeriodicalIF":1.6,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A semi-parametric approach to receiver operating characteristic analysis with semi-continuous biomarker 用半连续生物标记物分析受者工作特性的半参数方法
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-15 DOI: 10.1016/j.csda.2025.108305
Baohao Wei , Dongsheng Tu , Chunlin Wang
The receiver operating characteristic (ROC) curve and its summary measures, such as the area under the curve (AUC) and Youden index, are frequently used to evaluate the performance of a binary classifier based on data of a continuous biomarker and meanwhile identify a suitable cut-off point for classification. In clinical applications, the biomarker used for classification may be semi-continuous in the sense that the observations contain excess zero values and the distribution of the positive values is skewed. In this paper, the distribution of a semi-continuous biomarker is modeled using a mixture of a discrete mass at zero and a continuous skewed positive component. In addition, the distributions of the continuous component in subjects with true negative and positive outcomes are linked by a semi-parametric density ratio model to gain efficiency. Under this framework, unified estimation and inference procedures are proposed for the ROC curve, its important summary measures, and the associated cut-off point. The asymptotic properties of the proposed semi-parametric estimators are established and used to construct their corresponding confidence intervals. Simulation results demonstrate the desirable performance of these estimators and confidence intervals in various settings. The proposed semi-parametric approach is also applied to assess the semi-continuous BRCA1 biomarker as a valid prognostic biomarker for predicting cancer progression at 4 years and identifying a cut-off point to classify patients with advanced ovarian cancer into two groups with good and bad prognoses.
接受者工作特征(ROC)曲线及其汇总指标,如曲线下面积(AUC)和约登指数,经常用于评估基于连续生物标志物数据的二元分类器的性能,同时确定合适的分类截止点。在临床应用中,用于分类的生物标志物可能是半连续的,这意味着观察值包含多余的零值,并且正值的分布是倾斜的。在本文中,一个半连续的生物标志物的分布是使用一个离散的质量在零和一个连续偏斜的正分量的混合物建模。此外,连续分量在真阴性和真阳性结果受试者中的分布通过半参数密度比模型联系起来,以提高效率。在此框架下,提出了统一的ROC曲线、重要的汇总测度和相关的截止点的估计和推断程序。建立了所提半参数估计量的渐近性质,并利用其构造了相应的置信区间。仿真结果表明,这些估计器和置信区间在各种设置下都具有良好的性能。提出的半参数方法也被用于评估半连续BRCA1生物标志物作为预测4年癌症进展的有效预后生物标志物,并确定将晚期卵巢癌患者分为预后良好和预后不良两组的截断点。
{"title":"A semi-parametric approach to receiver operating characteristic analysis with semi-continuous biomarker","authors":"Baohao Wei ,&nbsp;Dongsheng Tu ,&nbsp;Chunlin Wang","doi":"10.1016/j.csda.2025.108305","DOIUrl":"10.1016/j.csda.2025.108305","url":null,"abstract":"<div><div>The receiver operating characteristic (ROC) curve and its summary measures, such as the area under the curve (AUC) and Youden index, are frequently used to evaluate the performance of a binary classifier based on data of a continuous biomarker and meanwhile identify a suitable cut-off point for classification. In clinical applications, the biomarker used for classification may be semi-continuous in the sense that the observations contain excess zero values and the distribution of the positive values is skewed. In this paper, the distribution of a semi-continuous biomarker is modeled using a mixture of a discrete mass at zero and a continuous skewed positive component. In addition, the distributions of the continuous component in subjects with true negative and positive outcomes are linked by a semi-parametric density ratio model to gain efficiency. Under this framework, unified estimation and inference procedures are proposed for the ROC curve, its important summary measures, and the associated cut-off point. The asymptotic properties of the proposed semi-parametric estimators are established and used to construct their corresponding confidence intervals. Simulation results demonstrate the desirable performance of these estimators and confidence intervals in various settings. The proposed semi-parametric approach is also applied to assess the semi-continuous BRCA1 biomarker as a valid prognostic biomarker for predicting cancer progression at 4 years and identifying a cut-off point to classify patients with advanced ovarian cancer into two groups with good and bad prognoses.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108305"},"PeriodicalIF":1.6,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hypothesis test in high dimensional multi-response linear models 高维多响应线性模型的假设检验
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-10 DOI: 10.1016/j.csda.2025.108303
Yuan Ke , Rongmao Zhang , Wenyang Zhang , Changliang Zou
Data with multiple responses is very common in economics, engineering, finance, and social science. Analyzing each response variable separately may not be a good strategy as this approach can overlook important information and lead to suboptimal results. In some cases, it may not even provide an answer to the question of interest. Multi-response linear models serve as an important tool for joint analysis. While the methodology and theory of classic multi-response linear models are well-established, they may not be applicable to high-dimensional cases. In this paper, we propose a powerful hypothesis test for the coefficient matrix of a high-dimensional multi-response linear model. We establish asymptotic results and conduct comprehensive simulation studies to demonstrate that the proposed hypothesis test is more powerful than alternative methods. Furthermore, we apply the hypothesis test to two real datasets, illustrating its usefulness in addressing practical problems.
具有多个响应的数据在经济学、工程学、金融学和社会科学中非常常见。单独分析每个响应变量可能不是一个好策略,因为这种方法可能会忽略重要信息并导致次优结果。在某些情况下,它甚至可能无法提供兴趣问题的答案。多响应线性模型是进行联合分析的重要工具。虽然经典的多响应线性模型的方法和理论是完善的,但它们可能不适用于高维情况。本文对高维多响应线性模型的系数矩阵提出了一个强有力的假设检验。我们建立了渐近结果,并进行了全面的模拟研究,以证明所提出的假设检验比其他方法更强大。此外,我们将假设检验应用于两个真实数据集,说明了它在解决实际问题方面的有用性。
{"title":"Hypothesis test in high dimensional multi-response linear models","authors":"Yuan Ke ,&nbsp;Rongmao Zhang ,&nbsp;Wenyang Zhang ,&nbsp;Changliang Zou","doi":"10.1016/j.csda.2025.108303","DOIUrl":"10.1016/j.csda.2025.108303","url":null,"abstract":"<div><div>Data with multiple responses is very common in economics, engineering, finance, and social science. Analyzing each response variable separately may not be a good strategy as this approach can overlook important information and lead to suboptimal results. In some cases, it may not even provide an answer to the question of interest. Multi-response linear models serve as an important tool for joint analysis. While the methodology and theory of classic multi-response linear models are well-established, they may not be applicable to high-dimensional cases. In this paper, we propose a powerful hypothesis test for the coefficient matrix of a high-dimensional multi-response linear model. We establish asymptotic results and conduct comprehensive simulation studies to demonstrate that the proposed hypothesis test is more powerful than alternative methods. Furthermore, we apply the hypothesis test to two real datasets, illustrating its usefulness in addressing practical problems.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108303"},"PeriodicalIF":1.6,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modelling catastrophic extinction in stochastic birth-death process: Analytical insights, estimation, and efficient simulation 随机生-死过程中的灾难性灭绝建模:分析见解、估计和有效模拟
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-07 DOI: 10.1016/j.csda.2025.108302
Clement Twumasi
A comprehensive analytical and computational framework is developed for the linear birth-death process (LBDP) with catastrophic extinction (BDC process), a continuous-time Markov model that incorporates sudden extinction events into the classical LBDP. Despite its conceptual simplicity, the underlying BDC process poses substantial challenges in deriving exact transition probabilities and performing reliable parameter estimation, particularly under discrete-time observations. While previous work established foundational properties using spectral methods and probability generating functions (PGFs), explicit analytical expressions for transition probabilities and theoretical moments have remained unavailable, limiting practical applications in extinction-prone systems. This limitation is addressed by reparameterising the PGF through functional restructuring, yielding exact closed-form expressions for the transition probability function and the theoretical moments of the discretely observed BDC process, with results validated through comprehensive numerical experiments for the first time. Three parameter estimation approaches tailored to the BDC process are introduced and evaluated: maximum likelihood estimation (MLE), generalised method of moments (GMM), and an embedded Galton-Watson (GW) approach, with trade-offs between computational efficiency and estimation accuracy examined across diverse simulation scenarios. To improve scalability, a Monte Carlo simulation framework based on a hybrid tau-leaping algorithm is formulated, specifically adapted to extinction-driven dynamics, offering a computationally efficient alternative to the exact stochastic simulation algorithm (SSA). The proposed methodologies offer a tractable and scalable foundation for incorporating the BDC process into applied stochastic models, particularly in ecological, epidemiological, and biological systems where populations are susceptible to sudden collapse due to catastrophic events such as host mortality or immune response.
摘要针对线性生-死过程(LBDP)与灾难性灭绝(BDC)过程,建立了一个综合的分析和计算框架,这是一种将突然灭绝事件纳入经典LBDP的连续时间马尔可夫模型。尽管其概念简单,但潜在的BDC过程在推导精确的转移概率和执行可靠的参数估计方面提出了重大挑战,特别是在离散时间观测下。虽然以前的工作使用谱方法和概率生成函数(PGFs)建立了基本性质,但过渡概率和理论矩的明确解析表达式仍然不可用,限制了在易灭绝系统中的实际应用。通过功能重构对PGF进行重新参数化,得到了转移概率函数和离散观测BDC过程的理论矩的精确封闭表达式,并首次通过综合数值实验验证了结果。介绍并评估了针对BDC过程量身定制的三种参数估计方法:最大似然估计(MLE)、广义矩量法(GMM)和嵌入式高尔顿-沃森(GW)方法,并在不同的模拟场景中检查了计算效率和估计精度之间的权衡。为了提高可扩展性,制定了基于混合tau跳跃算法的蒙特卡罗模拟框架,特别适用于灭绝驱动的动力学,提供了精确随机模拟算法(SSA)的计算效率替代方案。所提出的方法为将BDC过程纳入应用随机模型提供了一个易于处理和可扩展的基础,特别是在生态,流行病学和生物系统中,种群容易因灾难性事件(如宿主死亡或免疫反应)而突然崩溃。
{"title":"Modelling catastrophic extinction in stochastic birth-death process: Analytical insights, estimation, and efficient simulation","authors":"Clement Twumasi","doi":"10.1016/j.csda.2025.108302","DOIUrl":"10.1016/j.csda.2025.108302","url":null,"abstract":"<div><div>A comprehensive analytical and computational framework is developed for the linear birth-death process (LBDP) with catastrophic extinction (BDC process), a continuous-time Markov model that incorporates sudden extinction events into the classical LBDP. Despite its conceptual simplicity, the underlying BDC process poses substantial challenges in deriving exact transition probabilities and performing reliable parameter estimation, particularly under discrete-time observations. While previous work established foundational properties using spectral methods and probability generating functions (PGFs), explicit analytical expressions for transition probabilities and theoretical moments have remained unavailable, limiting practical applications in extinction-prone systems. This limitation is addressed by reparameterising the PGF through functional restructuring, yielding exact closed-form expressions for the transition probability function and the theoretical moments of the discretely observed BDC process, with results validated through comprehensive numerical experiments for the first time. Three parameter estimation approaches tailored to the BDC process are introduced and evaluated: maximum likelihood estimation (MLE), generalised method of moments (GMM), and an embedded Galton-Watson (GW) approach, with trade-offs between computational efficiency and estimation accuracy examined across diverse simulation scenarios. To improve scalability, a Monte Carlo simulation framework based on a hybrid tau-leaping algorithm is formulated, specifically adapted to extinction-driven dynamics, offering a computationally efficient alternative to the exact stochastic simulation algorithm (SSA). The proposed methodologies offer a tractable and scalable foundation for incorporating the BDC process into applied stochastic models, particularly in ecological, epidemiological, and biological systems where populations are susceptible to sudden collapse due to catastrophic events such as host mortality or immune response.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108302"},"PeriodicalIF":1.6,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transfer learning for high dimensional data with discrete responses 具有离散响应的高维数据迁移学习
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-16 DOI: 10.1016/j.csda.2025.108292
Zejing Zheng, Shengbing Zheng, Junlong Zhao
Discrete responses are frequently encountered in applications, particularly in classification problems. However, the high cost of collecting responses or labels often leads to a scarcity of samples, which significantly diminishes the accuracy of statistical inferences, particularly in high-dimensional settings. To address this limitation, transfer learning can be utilized for high-dimensional data with discrete responses by incorporating relevant source data into the target study of interest. Within the framework of generalized linear models, the case where responses are bounded are first considered, and an importance-weighted transfer learning method, referred to as IWTL-DR, is proposed. This method selects data at the individual level, thereby utilizing the source data more efficiently. Subsequently, this approach is extended to scenarios involving unbounded responses. Theoretical properties of the IWTL-DR method are established and compared with existing techniques. Extensive simulations and analyses of real data show the advantages of our approach.
离散响应在应用中经常遇到,特别是在分类问题中。然而,收集响应或标签的高成本往往导致样本稀缺,这大大降低了统计推断的准确性,特别是在高维环境中。为了解决这一限制,迁移学习可以通过将相关源数据合并到感兴趣的目标研究中来用于具有离散响应的高维数据。在广义线性模型的框架内,首先考虑了响应有界的情况,提出了一种重要性加权迁移学习方法,称为IWTL-DR。该方法在个人级别选择数据,从而更有效地利用源数据。随后,将该方法扩展到涉及无界响应的场景。建立了IWTL-DR方法的理论性质,并与现有方法进行了比较。大量的模拟和实际数据分析表明了我们的方法的优势。
{"title":"Transfer learning for high dimensional data with discrete responses","authors":"Zejing Zheng,&nbsp;Shengbing Zheng,&nbsp;Junlong Zhao","doi":"10.1016/j.csda.2025.108292","DOIUrl":"10.1016/j.csda.2025.108292","url":null,"abstract":"<div><div>Discrete responses are frequently encountered in applications, particularly in classification problems. However, the high cost of collecting responses or labels often leads to a scarcity of samples, which significantly diminishes the accuracy of statistical inferences, particularly in high-dimensional settings. To address this limitation, transfer learning can be utilized for high-dimensional data with discrete responses by incorporating relevant source data into the target study of interest. Within the framework of generalized linear models, the case where responses are bounded are first considered, and an importance-weighted transfer learning method, referred to as IWTL-DR, is proposed. This method selects data at the individual level, thereby utilizing the source data more efficiently. Subsequently, this approach is extended to scenarios involving unbounded responses. Theoretical properties of the IWTL-DR method are established and compared with existing techniques. Extensive simulations and analyses of real data show the advantages of our approach.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108292"},"PeriodicalIF":1.6,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145364811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bilateral matrix spatiotemporal autoregressive model 双边矩阵时空自回归模型
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-15 DOI: 10.1016/j.csda.2025.108291
Lei Qin , Xiaomei Zhang , Yingqiu Zhu , Yang Chen , Ben-Chang Shia
As time series with matrix structures becoming more and more common in the fields of finance, economics, and management, modeling matrix-valued time series becomes an emerging research hotspot. Spatial effects lead by different locations play an important role in the analysis of time series. Although matrix autoregressive model (MAR) provides a promising solution for modeling matrix-valued time series, it only models the dynamic effects in the temporal dimension, without capturing the spatial effects. In this paper, we propose a bilateral matrix spatiotemporal autoregressive model (BMSAR), which fully considers the pure spatial effects, pure dynamic effects, and time-delay spatial effects while maintaining and utilizing the matrix structure. In order to solve the endogeneity problem, the estimation process for BMSAR is based on the least squares method and the Yule-Walker equation for iterative estimation. The simulation results show that as compared with the MAR, the BMSAR model effectively reflects the impact of spatial structure on the sequence observations. The estimator for BMSAR proposed in this paper is consistent. It achieves promising performance when the sample size is relatively large. The proposed model and algorithm are also verified using the trade and macroeconomic indicator datasets of seven countries in the G7 summit, and the prediction accuracy is significantly improved as compared with the existing models.
随着具有矩阵结构的时间序列在金融、经济、管理等领域的应用越来越广泛,矩阵值时间序列的建模成为一个新兴的研究热点。不同地点导致的空间效应在时间序列分析中起着重要作用。虽然矩阵自回归模型(matrix autoregressive model, MAR)为矩阵值时间序列的建模提供了一种很有前途的解决方案,但它只模拟了时间维度上的动态效应,而没有捕捉到空间效应。本文提出了一个双边矩阵时空自回归模型(BMSAR),该模型在保持和利用矩阵结构的同时,充分考虑了纯空间效应、纯动态效应和时滞空间效应。为了解决内生性问题,BMSAR的估计过程基于最小二乘法和Yule-Walker方程迭代估计。仿真结果表明,与MAR模型相比,BMSAR模型能有效地反映空间结构对序列观测的影响。本文提出的BMSAR估计量是一致的。在样本量较大的情况下,该方法取得了良好的性能。利用G7峰会七国的贸易和宏观经济指标数据集对模型和算法进行了验证,与现有模型相比,预测精度显著提高。
{"title":"Bilateral matrix spatiotemporal autoregressive model","authors":"Lei Qin ,&nbsp;Xiaomei Zhang ,&nbsp;Yingqiu Zhu ,&nbsp;Yang Chen ,&nbsp;Ben-Chang Shia","doi":"10.1016/j.csda.2025.108291","DOIUrl":"10.1016/j.csda.2025.108291","url":null,"abstract":"<div><div>As time series with matrix structures becoming more and more common in the fields of finance, economics, and management, modeling matrix-valued time series becomes an emerging research hotspot. Spatial effects lead by different locations play an important role in the analysis of time series. Although matrix autoregressive model (MAR) provides a promising solution for modeling matrix-valued time series, it only models the dynamic effects in the temporal dimension, without capturing the spatial effects. In this paper, we propose a bilateral matrix spatiotemporal autoregressive model (BMSAR), which fully considers the pure spatial effects, pure dynamic effects, and time-delay spatial effects while maintaining and utilizing the matrix structure. In order to solve the endogeneity problem, the estimation process for BMSAR is based on the least squares method and the Yule-Walker equation for iterative estimation. The simulation results show that as compared with the MAR, the BMSAR model effectively reflects the impact of spatial structure on the sequence observations. The estimator for BMSAR proposed in this paper is consistent. It achieves promising performance when the sample size is relatively large. The proposed model and algorithm are also verified using the trade and macroeconomic indicator datasets of seven countries in the G7 summit, and the prediction accuracy is significantly improved as compared with the existing models.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108291"},"PeriodicalIF":1.6,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Denoising over networks with applications to partially observed epidemics 应用于部分观测到的流行病的网络去噪
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-10 DOI: 10.1016/j.csda.2025.108276
Claire Donnat , Olga Klopp , Nicolas Verzelen
A novel method is introduced for denoising partially observed signals over networks using graph total variation (TV) regularization, a technique adapted from signal processing to handle binary data. This approach extends existing results derived for Gaussian data to the discrete, binary case — a method hereafter referred to as “one-bit TV denoising.” The framework considers a network represented as a set of nodes with binary observations, where edges encode pairwise relationships between nodes. A key theoretical contribution is the establishment of consistency guarantees of graph TV denoising for the recovery of underlying node-level probabilities. The method is well suited for settings with missing data, enabling robust inference from incomplete observations. Extensive numerical experiments and real-world applications further highlight its effectiveness, underscoring its potential in various practical scenarios that require denoising and prediction on networks with binary-valued data. Finally, applications to two real-world epidemic scenarios demonstrate that one-bit total variation denoising significantly enhances the accuracy of network-based nowcasting and forecasting.
介绍了一种利用图全变分(TV)正则化技术对网络上部分观测信号进行去噪的新方法。这种方法将高斯数据的现有结果扩展到离散的二进制情况-一种称为“一位电视去噪”的方法。该框架将网络视为一组具有二进制观测值的节点,其中边编码节点之间的成对关系。一个关键的理论贡献是为恢复底层节点级概率建立了图电视去噪的一致性保证。该方法非常适合于缺少数据的设置,可以从不完整的观测中进行稳健的推断。大量的数值实验和实际应用进一步强调了它的有效性,强调了它在各种实际场景中的潜力,这些场景需要对具有二值数据的网络进行去噪和预测。最后,对两种真实疫情场景的应用表明,1位总变差去噪显著提高了基于网络的临近预报和预报的准确性。
{"title":"Denoising over networks with applications to partially observed epidemics","authors":"Claire Donnat ,&nbsp;Olga Klopp ,&nbsp;Nicolas Verzelen","doi":"10.1016/j.csda.2025.108276","DOIUrl":"10.1016/j.csda.2025.108276","url":null,"abstract":"<div><div>A novel method is introduced for denoising partially observed signals over networks using graph total variation (TV) regularization, a technique adapted from signal processing to handle binary data. This approach extends existing results derived for Gaussian data to the discrete, binary case — a method hereafter referred to as “one-bit TV denoising.” The framework considers a network represented as a set of nodes with binary observations, where edges encode pairwise relationships between nodes. A key theoretical contribution is the establishment of consistency guarantees of graph TV denoising for the recovery of underlying node-level probabilities. The method is well suited for settings with missing data, enabling robust inference from incomplete observations. Extensive numerical experiments and real-world applications further highlight its effectiveness, underscoring its potential in various practical scenarios that require denoising and prediction on networks with binary-valued data. Finally, applications to two real-world epidemic scenarios demonstrate that one-bit total variation denoising significantly enhances the accuracy of network-based nowcasting and forecasting.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108276"},"PeriodicalIF":1.6,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145322732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring multivariate regression association via spatial sign 通过空间符号测量多元回归关联
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-03 DOI: 10.1016/j.csda.2025.108288
Jia-Han Shih , Yi-Hau Chen
A regression association measure is proposed for capturing predictability of a multivariate outcome Y=(Y1,,Yd) from a multivariate covariate X=(X1,,Xp). Motivated by existing measures, the conventional Kendall’s tau is first generalized to measure multivariate association between two random vectors. Then the predictability of Y from X is measured by the generalized multivariate Kendall’s tau between Y and Y, where Y and Y share the same conditional distribution and are conditionally independent given X. The proposed regression association measure can be expressed as the proportion of the variance of a function of Y that can be explained by X, indicating that the measure has a direct interpretation in terms of predictability. Based on the proposed measure, a conditional regression association measure is further proposed, which can be utilized to perform variable selection. Since the proposed measures are based on Y and Y, a simple nonparametric estimation method based on nearest neighbors is available. An R package, MRAM, has been developed for implementation. Simulation studies are carried out to assess the performance of the proposed methods and real data examples are analyzed for illustration.
提出了一种回归关联度量,用于从多变量协变量X=(X1,…,Xp)中捕获多变量结果Y=(Y1,…,Yd)的可预测性。在现有测度的激励下,首先将传统的肯德尔τ推广到两个随机向量之间的多变量关联度量。然后,Y对X的可预测性通过Y和Y ‘之间的广义多元肯德尔τ来衡量,其中Y和Y ’具有相同的条件分布,并且在给定X的情况下是条件独立的。所提出的回归关联度量可以表示为Y的一个函数的方差所占的比例,该函数可以被X解释,表明该度量在可预测性方面具有直接的解释。在此基础上,进一步提出了一种条件回归关联测度,利用该测度进行变量选择。由于所提出的度量是基于Y和Y '的,因此可以使用一种简单的基于最近邻的非参数估计方法。一个R包,MRAM,已经开发实现。通过仿真研究来评估所提方法的性能,并对实际数据实例进行了分析。
{"title":"Measuring multivariate regression association via spatial sign","authors":"Jia-Han Shih ,&nbsp;Yi-Hau Chen","doi":"10.1016/j.csda.2025.108288","DOIUrl":"10.1016/j.csda.2025.108288","url":null,"abstract":"<div><div>A regression association measure is proposed for capturing predictability of a multivariate outcome <span><math><mrow><mi>Y</mi><mo>=</mo><mo>(</mo><msub><mi>Y</mi><mn>1</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>Y</mi><mi>d</mi></msub><mo>)</mo></mrow></math></span> from a multivariate covariate <span><math><mrow><mi>X</mi><mo>=</mo><mo>(</mo><msub><mi>X</mi><mn>1</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>X</mi><mi>p</mi></msub><mo>)</mo></mrow></math></span>. Motivated by existing measures, the conventional Kendall’s tau is first generalized to measure multivariate association between two random vectors. Then the predictability of <span><math><mi>Y</mi></math></span> from <span><math><mi>X</mi></math></span> is measured by the generalized multivariate Kendall’s tau between <span><math><mi>Y</mi></math></span> and <span><math><msup><mi>Y</mi><mo>′</mo></msup></math></span>, where <span><math><mi>Y</mi></math></span> and <span><math><msup><mi>Y</mi><mo>′</mo></msup></math></span> share the same conditional distribution and are conditionally independent given <span><math><mi>X</mi></math></span>. The proposed regression association measure can be expressed as the proportion of the variance of a function of <span><math><mi>Y</mi></math></span> that can be explained by <span><math><mi>X</mi></math></span>, indicating that the measure has a direct interpretation in terms of predictability. Based on the proposed measure, a conditional regression association measure is further proposed, which can be utilized to perform variable selection. Since the proposed measures are based on <span><math><mi>Y</mi></math></span> and <span><math><msup><mi>Y</mi><mo>′</mo></msup></math></span>, a simple nonparametric estimation method based on nearest neighbors is available. An R package, <span>MRAM</span>, has been developed for implementation. Simulation studies are carried out to assess the performance of the proposed methods and real data examples are analyzed for illustration.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108288"},"PeriodicalIF":1.6,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145322731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast autoregressive model for multivariate dependent outcomes with application to lipidomics analysis for Alzheimer’s disease and APOE-ε4 多变量依赖结果的快速自回归模型及其在阿尔茨海默病和APOE-ε4脂质组学分析中的应用
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-29 DOI: 10.1016/j.csda.2025.108280
Hwiyoung Lee , Zhenyao Ye , Chixiang Chen , Peter Kochunov , L. Elliot Hong , Shuo Chen
Association analysis of multivariate omics outcomes is challenging due to the high dimensionality and inter-correlation among outcome variables. In practice, the classic multi-univariate analysis approaches are commonly employed, utilizing linear regression models for each individual outcome followed by adjustments for multiplicity through control of the false discovery rate (FDR) or family-wise error rate (FWER). While straightforward, these multi-univariate methods overlook dependencies between outcome variables. This oversight leads to less accurate statistical inferences, characterized by lower power and an increased false discovery rate, ultimately resulting in reduced replicability across studies. Recently, advanced frequentist and Bayesian methods have been developed to account for these dependencies. However, these methods often pose significant computational challenges for researchers in the field. To bridge this gap, a computationally efficient autoregressive multivariate regression model is proposed that explicitly accounts for the dependence structure among outcome variables. Through extensive simulations, it is demonstrated that the approach provides more accurate multivariate inferences than traditional methods and remains robust even under model misspecification. Additionally, the proposed method is applied to investigate whether the associations between serum lipidomics outcomes and Alzheimer’s disease differentiate in ε4 allele carriers and non-carriers of the apolipoprotein E (APOE) gene.
多变量组学结果的关联分析具有挑战性,因为结果变量之间具有高维性和相互相关性。在实践中,通常采用经典的多单变量分析方法,对每个结果使用线性回归模型,然后通过控制错误发现率(FDR)或家庭错误率(FWER)来调整多重性。虽然简单,但这些多单变量方法忽略了结果变量之间的依赖关系。这种疏忽导致统计推断不准确,其特点是功率较低,错误发现率增加,最终导致研究的可重复性降低。最近,先进的频率论和贝叶斯方法被开发出来解释这些依赖关系。然而,这些方法通常会给该领域的研究人员带来重大的计算挑战。为了弥补这一差距,提出了一种计算效率高的自回归多元回归模型,该模型明确地考虑了结果变量之间的依赖结构。通过大量的仿真表明,该方法比传统方法提供了更准确的多变量推理,并且即使在模型不规范的情况下仍然具有鲁棒性。此外,该方法还应用于研究载脂蛋白E (APOE)基因的ε4等位基因携带者和非携带者之间的血清脂质组学结果与阿尔茨海默病之间的相关性是否存在差异。
{"title":"Fast autoregressive model for multivariate dependent outcomes with application to lipidomics analysis for Alzheimer’s disease and APOE-ε4","authors":"Hwiyoung Lee ,&nbsp;Zhenyao Ye ,&nbsp;Chixiang Chen ,&nbsp;Peter Kochunov ,&nbsp;L. Elliot Hong ,&nbsp;Shuo Chen","doi":"10.1016/j.csda.2025.108280","DOIUrl":"10.1016/j.csda.2025.108280","url":null,"abstract":"<div><div>Association analysis of multivariate omics outcomes is challenging due to the high dimensionality and inter-correlation among outcome variables. In practice, the classic multi-univariate analysis approaches are commonly employed, utilizing linear regression models for each individual outcome followed by adjustments for multiplicity through control of the false discovery rate (FDR) or family-wise error rate (FWER). While straightforward, these multi-univariate methods overlook dependencies between outcome variables. This oversight leads to less accurate statistical inferences, characterized by lower power and an increased false discovery rate, ultimately resulting in reduced replicability across studies. Recently, advanced frequentist and Bayesian methods have been developed to account for these dependencies. However, these methods often pose significant computational challenges for researchers in the field. To bridge this gap, a computationally efficient autoregressive multivariate regression model is proposed that explicitly accounts for the dependence structure among outcome variables. Through extensive simulations, it is demonstrated that the approach provides more accurate multivariate inferences than traditional methods and remains robust even under model misspecification. Additionally, the proposed method is applied to investigate whether the associations between serum lipidomics outcomes and Alzheimer’s disease differentiate in <span><math><mrow><mrow><mi>ε</mi></mrow><mn>4</mn></mrow></math></span> allele carriers and non-carriers of the apolipoprotein E (APOE) gene.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108280"},"PeriodicalIF":1.6,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bootstrap-based goodness-of-fit test for parametric families of conditional distributions 条件分布参数族的自举拟合优度检验
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-27 DOI: 10.1016/j.csda.2025.108289
Gitte Kremling, Gerhard Dikta
A consistent goodness-of-fit test for distributional regression is introduced. The test statistic is based on a process that traces the difference between a nonparametric and a semi-parametric estimate of the marginal distribution function of Y. As its asymptotic null distribution is not distribution-free, a parametric bootstrap method is used to determine critical values. Empirical results suggest that, in certain scenarios, the test outperforms existing specification tests by achieving a higher power and thereby offering greater sensitivity to deviations from the assumed parametric distribution family. Notably, the proposed test does not involve any hyperparameters and can easily be applied to individual datasets using the gofreg-package in R.
介绍了分布回归的一致拟合优度检验。检验统计量基于跟踪y的边际分布函数的非参数估计和半参数估计之间的差异的过程。由于其渐近零分布不是无分布的,因此使用参数自举法来确定临界值。经验结果表明,在某些情况下,该测试通过获得更高的功率,从而对假设参数分布族的偏差提供更高的灵敏度,从而优于现有的规范测试。值得注意的是,提议的测试不涉及任何超参数,并且可以使用R中的gofreg-package轻松地应用于单个数据集。
{"title":"Bootstrap-based goodness-of-fit test for parametric families of conditional distributions","authors":"Gitte Kremling,&nbsp;Gerhard Dikta","doi":"10.1016/j.csda.2025.108289","DOIUrl":"10.1016/j.csda.2025.108289","url":null,"abstract":"<div><div>A consistent goodness-of-fit test for distributional regression is introduced. The test statistic is based on a process that traces the difference between a nonparametric and a semi-parametric estimate of the marginal distribution function of <span><math><mi>Y</mi></math></span>. As its asymptotic null distribution is not distribution-free, a parametric bootstrap method is used to determine critical values. Empirical results suggest that, in certain scenarios, the test outperforms existing specification tests by achieving a higher power and thereby offering greater sensitivity to deviations from the assumed parametric distribution family. Notably, the proposed test does not involve any hyperparameters and can easily be applied to individual datasets using the gofreg-package in R.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108289"},"PeriodicalIF":1.6,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1