首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Change-point detection for multivariate nonparametric regression with deep neural networks 基于深度神经网络的多元非参数回归变化点检测
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-25 DOI: 10.1016/j.csda.2025.108334
Houlin Zhou , Hanbing Zhu , Xuejun Wang
This article addresses the problem of detecting structural changes in multivariate nonparametric regression models, which commonly arise in high-dimensional and time-dependent data analysis. We propose a CUSUM-type test statistic constructed from estimators obtained via deep neural networks (DNNs). The theoretical properties of the proposed test statistic are rigorously derived under the null and alternative hypotheses. Under the assumptions of a low-dimensional manifold structure in the data support and a hierarchical model architecture, we demonstrate that the DNN-based change-point detection method can effectively mitigate the curse of dimensionality. Furthermore, we establish the asymptotic properties and derive the convergence rate of the estimator for the change-point location. Extensive comparative simulation studies confirm the effectiveness and superior performance of the proposed approach. Finally, we illustrate the practical applicability of the method through an empirical analysis using real-world regional electricity consumption data.
本文解决了在高维和时变数据分析中常见的多变量非参数回归模型中检测结构变化的问题。我们提出了一种基于深度神经网络(dnn)估计量的cusum型检验统计量。所提出的检验统计量的理论性质是在零假设和备选假设下严格推导出来的。在假设数据支持具有低维流形结构和分层模型结构的前提下,我们证明了基于dnn的变点检测方法可以有效地缓解维数诅咒。进一步,我们建立了渐近性质,并推导了变点位置估计量的收敛速率。大量的仿真对比研究证实了该方法的有效性和优越的性能。最后,通过实际区域用电量数据的实证分析,说明了该方法的实用性。
{"title":"Change-point detection for multivariate nonparametric regression with deep neural networks","authors":"Houlin Zhou ,&nbsp;Hanbing Zhu ,&nbsp;Xuejun Wang","doi":"10.1016/j.csda.2025.108334","DOIUrl":"10.1016/j.csda.2025.108334","url":null,"abstract":"<div><div>This article addresses the problem of detecting structural changes in multivariate nonparametric regression models, which commonly arise in high-dimensional and time-dependent data analysis. We propose a CUSUM-type test statistic constructed from estimators obtained via deep neural networks (DNNs). The theoretical properties of the proposed test statistic are rigorously derived under the null and alternative hypotheses. Under the assumptions of a low-dimensional manifold structure in the data support and a hierarchical model architecture, we demonstrate that the DNN-based change-point detection method can effectively mitigate the curse of dimensionality. Furthermore, we establish the asymptotic properties and derive the convergence rate of the estimator for the change-point location. Extensive comparative simulation studies confirm the effectiveness and superior performance of the proposed approach. Finally, we illustrate the practical applicability of the method through an empirical analysis using real-world regional electricity consumption data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108334"},"PeriodicalIF":1.6,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145908905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric density estimation on complex domains using manifold-aware Bayesian additive tree models 基于流形感知贝叶斯加性树模型的复域非参数密度估计
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-25 DOI: 10.1016/j.csda.2025.108335
Isaac Diaz-Ray , Huiyan Sang , Guanyu Hu , Ligang Lu
Density or intensity function estimation for point pattern data observed on complex domains finds wide applications in spatial data analysis. However, many existing popular density estimation methods face challenges when domains have irregular boundaries, line network structures, sharp concavities, or interior holes. A nonparametric Bayesian additive ensemble of spanning trees model is developed to model the distribution of event occurrences on complex domains. This model uses a random spanning tree weak learner, which can produce flexible and contiguous domain partitions while respecting its geometry and constraints. The method has the advantage of capturing both varying smoothness and sharp changes in density functions. An efficient exact likelihood-based Bayesian inference algorithm is proposed to estimate the density function with uncertainty measures, leveraging a data thinning strategy combined with Poisson-Gamma conjugacy. Simulation studies on various complex domains demonstrate the advantages of the proposed model over competing methods. The method is further applied to the analysis of basketball shot data and crime locations on a road network.
复杂域上观测到的点模式数据的密度或强度函数估计在空间数据分析中有着广泛的应用。然而,当区域具有不规则边界、线网结构、尖锐凹陷或内部孔洞时,许多现有的流行密度估计方法面临挑战。提出了一种非参数贝叶斯生成树加性集合模型来模拟复杂域上事件发生的分布。该模型采用随机生成树弱学习器,在尊重域的几何形状和约束条件的前提下,生成灵活且连续的域分区。该方法的优点是既能捕捉到变化的平滑度,又能捕捉到密度函数的急剧变化。提出了一种有效的基于精确似然的贝叶斯推理算法,利用数据细化策略与泊松-伽马共轭相结合,利用不确定性测度估计密度函数。对各种复杂领域的仿真研究表明了该模型相对于其他方法的优越性。该方法进一步应用于篮球投篮数据和路网犯罪地点的分析。
{"title":"Nonparametric density estimation on complex domains using manifold-aware Bayesian additive tree models","authors":"Isaac Diaz-Ray ,&nbsp;Huiyan Sang ,&nbsp;Guanyu Hu ,&nbsp;Ligang Lu","doi":"10.1016/j.csda.2025.108335","DOIUrl":"10.1016/j.csda.2025.108335","url":null,"abstract":"<div><div>Density or intensity function estimation for point pattern data observed on complex domains finds wide applications in spatial data analysis. However, many existing popular density estimation methods face challenges when domains have irregular boundaries, line network structures, sharp concavities, or interior holes. A nonparametric Bayesian additive ensemble of spanning trees model is developed to model the distribution of event occurrences on complex domains. This model uses a random spanning tree weak learner, which can produce flexible and contiguous domain partitions while respecting its geometry and constraints. The method has the advantage of capturing both varying smoothness and sharp changes in density functions. An efficient exact likelihood-based Bayesian inference algorithm is proposed to estimate the density function with uncertainty measures, leveraging a data thinning strategy combined with Poisson-Gamma conjugacy. Simulation studies on various complex domains demonstrate the advantages of the proposed model over competing methods. The method is further applied to the analysis of basketball shot data and crime locations on a road network.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108335"},"PeriodicalIF":1.6,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-population sufficient dimension reduction 多种群充分降维
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-24 DOI: 10.1016/j.csda.2025.108321
Xuerong Meggie Wen , Yuexiao Dong , Li-Xing Zhu
A novel dimension-reduction method is introduced for multi-population data. The approach conducts a joint analysis that exploits information shared across populations while accommodating population-specific effects. Unlike partial dimension reduction methods, which identify related directions across all populations, or conditional analyses conducted independently within each population, the proposed two-step procedure leverages cross-population information to enhance estimation accuracy. The methodology is demonstrated through simulations and two real-data applications.
提出了一种新的多种群数据降维方法。该方法进行联合分析,利用不同种群之间共享的信息,同时适应特定种群的影响。与部分降维方法不同,部分降维方法在所有种群中识别相关方向,或在每个种群中独立进行条件分析,所提出的两步法利用跨种群信息来提高估计精度。通过仿真和两个实际数据应用验证了该方法。
{"title":"Multi-population sufficient dimension reduction","authors":"Xuerong Meggie Wen ,&nbsp;Yuexiao Dong ,&nbsp;Li-Xing Zhu","doi":"10.1016/j.csda.2025.108321","DOIUrl":"10.1016/j.csda.2025.108321","url":null,"abstract":"<div><div>A novel dimension-reduction method is introduced for multi-population data. The approach conducts a joint analysis that exploits information shared across populations while accommodating population-specific effects. Unlike partial dimension reduction methods, which identify related directions across all populations, or conditional analyses conducted independently within each population, the proposed two-step procedure leverages cross-population information to enhance estimation accuracy. The methodology is demonstrated through simulations and two real-data applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108321"},"PeriodicalIF":1.6,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seasonal ARIMA models with a random period 随机周期的季节性ARIMA模型
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-09 DOI: 10.1016/j.csda.2025.108320
Abdelhakim Aknouche , Stefanos Dimitrakopoulos , Nadia Rabehi
A general class of seasonal autoregressive integrated moving average models (SARIMA), whose period is an independent and identically distributed random process valued in a finite set, is proposed. This class of models is named random period seasonal ARIMA (SARIMAR). Attention is focused on three subsets of them: the random period seasonal autoregressive (SARR) models, the random period seasonal moving average (SMAR) models and the random period seasonal autoregressive moving average (SARMAR) models. First, the causality, invertibility, and autocovariance shape of these models are revealed. Then, the estimation of the model components (coefficients, innovation variance, probability distribution of the period, (unobserved) sample-path of the random period) is carried out using the Expectation-Maximization algorithm. In addition, a procedure for random elimination of seasonality is developed. A simulation study is conducted to assess the estimation accuracy of the proposed algorithmic scheme. Finally, the usefulness of the proposed methodology is illustrated with two applications about the annual Wolf sunspot numbers and the Canadian lynx data.
提出了一类季节自回归积分移动平均模型(SARIMA),其周期是一个独立的、同分布的随机过程,其值在有限集合内。这类模型被命名为随机周期季节性ARIMA (SARIMAR)。重点研究了其中的三个子集:随机周期季节自回归(SARR)模型、随机周期季节移动平均(SMAR)模型和随机周期季节自回归移动平均(sarar)模型。首先,揭示了这些模型的因果关系、可逆性和自协方差形状。然后,使用期望最大化算法对模型成分(系数、创新方差、周期的概率分布、随机周期的(未观测)样本路径)进行估计。此外,还制定了随机消除季节性的程序。通过仿真研究,验证了所提算法的估计精度。最后,以沃尔夫太阳黑子年数据和加拿大猞猁数据的两个应用实例说明了所提出方法的有效性。
{"title":"Seasonal ARIMA models with a random period","authors":"Abdelhakim Aknouche ,&nbsp;Stefanos Dimitrakopoulos ,&nbsp;Nadia Rabehi","doi":"10.1016/j.csda.2025.108320","DOIUrl":"10.1016/j.csda.2025.108320","url":null,"abstract":"<div><div>A general class of seasonal autoregressive integrated moving average models (SARIMA), whose period is an independent and identically distributed random process valued in a finite set, is proposed. This class of models is named random period seasonal ARIMA (SARIMAR). Attention is focused on three subsets of them: the random period seasonal autoregressive (SARR) models, the random period seasonal moving average (SMAR) models and the random period seasonal autoregressive moving average (SARMAR) models. First, the causality, invertibility, and autocovariance shape of these models are revealed. Then, the estimation of the model components (coefficients, innovation variance, probability distribution of the period, (unobserved) sample-path of the random period) is carried out using the Expectation-Maximization algorithm. In addition, a procedure for random elimination of seasonality is developed. A simulation study is conducted to assess the estimation accuracy of the proposed algorithmic scheme. Finally, the usefulness of the proposed methodology is illustrated with two applications about the annual Wolf sunspot numbers and the Canadian lynx data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108320"},"PeriodicalIF":1.6,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145798821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Debiased quantile significance testing with machine learning methods 用机器学习方法进行去偏分位数显著性检验
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-05 DOI: 10.1016/j.csda.2025.108319
Jiarong Ding , Yanmei Shi , Niwen Zhou , Mei Yao , Xu Guo
Testing the significance of a subset of covariates for a response is a critical problem with broad applications. A novel nonparametric significance testing procedure is developed to test whether a set of target covariates provides incremental information about the conditional quantile of the response given the other covariates. The proposed test statistics are constructed within the framework of debiased machine learning, which enables flexible estimation of unknown functions by leveraging machine learning methods. The asymptotic properties of the proposed test statistic under the null hypothesis are established, and the power under the alternatives is analyzed, demonstrating the ability of the procedure to detect local alternatives at the optimal parametric rate. To further enhance power, an ensemble quantile significance testing procedure is introduced. Extensive numerical studies and real data applications are conducted to illustrate the finite-sample performance of the proposed testing procedures.
检验一个响应的协变量子集的显著性是一个具有广泛应用的关键问题。开发了一种新的非参数显著性检验程序,用于检验一组目标协变量是否提供了关于给定其他协变量的响应的条件分位数的增量信息。所提出的测试统计量是在去偏机器学习的框架内构建的,通过利用机器学习方法,可以灵活地估计未知函数。建立了零假设下检验统计量的渐近性质,并分析了备选项下的幂次,证明了该方法能够以最优参数率检测局部备选项。为了进一步提高有效性,引入了集成分位数显著性检验程序。广泛的数值研究和实际数据应用进行了说明有限样本性能的测试程序所提出的。
{"title":"Debiased quantile significance testing with machine learning methods","authors":"Jiarong Ding ,&nbsp;Yanmei Shi ,&nbsp;Niwen Zhou ,&nbsp;Mei Yao ,&nbsp;Xu Guo","doi":"10.1016/j.csda.2025.108319","DOIUrl":"10.1016/j.csda.2025.108319","url":null,"abstract":"<div><div>Testing the significance of a subset of covariates for a response is a critical problem with broad applications. A novel nonparametric significance testing procedure is developed to test whether a set of target covariates provides incremental information about the conditional quantile of the response given the other covariates. The proposed test statistics are constructed within the framework of debiased machine learning, which enables flexible estimation of unknown functions by leveraging machine learning methods. The asymptotic properties of the proposed test statistic under the null hypothesis are established, and the power under the alternatives is analyzed, demonstrating the ability of the procedure to detect local alternatives at the optimal parametric rate. To further enhance power, an ensemble quantile significance testing procedure is introduced. Extensive numerical studies and real data applications are conducted to illustrate the finite-sample performance of the proposed testing procedures.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108319"},"PeriodicalIF":1.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145712512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parametric estimation of conditional archimedean copula generators for censored data 截尾数据条件阿基米德copula发生器的参数估计
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-30 DOI: 10.1016/j.csda.2025.108309
Marie Michaelides , Hélène Cossette , Mathieu Pigeon
A novel framework is introduced for estimating Archimedean copula generators in a conditional setting by embedding endogenous variables directly within the generator function. Unlike standard copula constructions that rely on a fixed dependence structure across all covariate levels, the proposed methodology allows both the strength and the shape of dependence to evolve with the covariates. To identify the values of a continuous risk factor at which the dependence pattern undergoes substantive changes, an iterative splitting algorithm is developed to determine optimal partitioning points within the covariate range. The approach is evaluated through applications to a diabetic retinopathy study and a claims reserving analysis, illustrating that explicitly modelling covariate effects yields a more accurate representation of dependence and enhances the practical relevance of copula models in medical and actuarial settings.
提出了一种新的框架,通过在生成函数中直接嵌入内生变量来估计条件设置下的阿基米德联结函数。与依赖于所有协变量水平的固定依赖结构的标准联结结构不同,所提出的方法允许依赖的强度和形状随协变量而变化。为了识别依赖模式发生实质性变化的连续风险因子的值,开发了一种迭代分裂算法来确定协变量范围内的最优划分点。该方法通过在糖尿病视网膜病变研究和索赔保留分析中的应用进行了评估,说明明确建模协变量效应可以更准确地表示依赖性,并增强了copula模型在医疗和精算环境中的实际相关性。
{"title":"Parametric estimation of conditional archimedean copula generators for censored data","authors":"Marie Michaelides ,&nbsp;Hélène Cossette ,&nbsp;Mathieu Pigeon","doi":"10.1016/j.csda.2025.108309","DOIUrl":"10.1016/j.csda.2025.108309","url":null,"abstract":"<div><div>A novel framework is introduced for estimating Archimedean copula generators in a conditional setting by embedding endogenous variables directly within the generator function. Unlike standard copula constructions that rely on a fixed dependence structure across all covariate levels, the proposed methodology allows both the strength and the shape of dependence to evolve with the covariates. To identify the values of a continuous risk factor at which the dependence pattern undergoes substantive changes, an iterative splitting algorithm is developed to determine optimal partitioning points within the covariate range. The approach is evaluated through applications to a diabetic retinopathy study and a claims reserving analysis, illustrating that explicitly modelling covariate effects yields a more accurate representation of dependence and enhances the practical relevance of copula models in medical and actuarial settings.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108309"},"PeriodicalIF":1.6,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequential hierarchical Bayesian model and particle filter estimation with two-step RJMCMC resampling 顺序层次贝叶斯模型及两步RJMCMC重采样的粒子滤波估计
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-17 DOI: 10.1016/j.csda.2025.108304
Yue Huan , Guoqiang Wang , Hai Xiang Lin
Data assimilation (DA) combines numerical model simulations with observed data to obtain the best possible description of a dynamical system and its uncertainty. Incorrect modeling assumptions can lead to filter divergence, making model identification an important issue in the field of DA. Variations in dynamic model structures can result in differences in parameter dimensions, complicating the resampling step in PFs. To meet this challenge, the Sequential Hierarchical Bayesian Model (SHBM) is proposed in this paper, which integrates the evolution model along with observation model from the DA scheme, and the hierarchical parameter model. A two-step resampling method are also proposed to estimate the SHBM: the first step uses the resampling scheme in the bootstrap filter to resample new particles based on weights, which may produce some duplicate particles; the second step utilizes the Reversible Jump Markov Chain Monte Carlo (RJMCMC) methods to draw new particles from the target distribution. This approach ensures particle diversity, with the first step aiming at avoiding particle degeneracy, and the second step intends to prevent the sample impoverishment. The performance in the Advection Equation example and Lorenz 96 example demonstrates the effectiveness of the proposed method.
数据同化(Data assimilation, DA)将数值模型模拟与观测数据相结合,以获得对动力系统及其不确定性的最佳描述。不正确的建模假设会导致滤波发散,使得模型识别成为数据分析领域的一个重要问题。动态模型结构的变化可能导致参数尺寸的差异,使PFs的重采样步骤复杂化。针对这一挑战,本文提出了序列层次贝叶斯模型(sequence Hierarchical Bayesian Model, SHBM),该模型将数据分析方案的演化模型和观测模型与层次参数模型相结合。提出了一种估计SHBM的两步重采样方法:第一步使用自举滤波器中的重采样方案,根据权重重新采样新粒子,这可能会产生一些重复粒子;第二步利用可逆跳跃马尔可夫链蒙特卡罗(RJMCMC)方法从目标分布中提取新粒子。该方法保证了粒子的多样性,第一步旨在避免粒子退化,第二步旨在防止样本贫困化。平流方程算例和Lorenz 96算例的性能验证了该方法的有效性。
{"title":"Sequential hierarchical Bayesian model and particle filter estimation with two-step RJMCMC resampling","authors":"Yue Huan ,&nbsp;Guoqiang Wang ,&nbsp;Hai Xiang Lin","doi":"10.1016/j.csda.2025.108304","DOIUrl":"10.1016/j.csda.2025.108304","url":null,"abstract":"<div><div>Data assimilation (DA) combines numerical model simulations with observed data to obtain the best possible description of a dynamical system and its uncertainty. Incorrect modeling assumptions can lead to filter divergence, making model identification an important issue in the field of DA. Variations in dynamic model structures can result in differences in parameter dimensions, complicating the resampling step in PFs. To meet this challenge, the Sequential Hierarchical Bayesian Model (SHBM) is proposed in this paper, which integrates the evolution model along with observation model from the DA scheme, and the hierarchical parameter model. A two-step resampling method are also proposed to estimate the SHBM: the first step uses the resampling scheme in the bootstrap filter to resample new particles based on weights, which may produce some duplicate particles; the second step utilizes the Reversible Jump Markov Chain Monte Carlo (RJMCMC) methods to draw new particles from the target distribution. This approach ensures particle diversity, with the first step aiming at avoiding particle degeneracy, and the second step intends to prevent the sample impoverishment. The performance in the Advection Equation example and Lorenz 96 example demonstrates the effectiveness of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108304"},"PeriodicalIF":1.6,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A semi-parametric approach to receiver operating characteristic analysis with semi-continuous biomarker 用半连续生物标记物分析受者工作特性的半参数方法
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-15 DOI: 10.1016/j.csda.2025.108305
Baohao Wei , Dongsheng Tu , Chunlin Wang
The receiver operating characteristic (ROC) curve and its summary measures, such as the area under the curve (AUC) and Youden index, are frequently used to evaluate the performance of a binary classifier based on data of a continuous biomarker and meanwhile identify a suitable cut-off point for classification. In clinical applications, the biomarker used for classification may be semi-continuous in the sense that the observations contain excess zero values and the distribution of the positive values is skewed. In this paper, the distribution of a semi-continuous biomarker is modeled using a mixture of a discrete mass at zero and a continuous skewed positive component. In addition, the distributions of the continuous component in subjects with true negative and positive outcomes are linked by a semi-parametric density ratio model to gain efficiency. Under this framework, unified estimation and inference procedures are proposed for the ROC curve, its important summary measures, and the associated cut-off point. The asymptotic properties of the proposed semi-parametric estimators are established and used to construct their corresponding confidence intervals. Simulation results demonstrate the desirable performance of these estimators and confidence intervals in various settings. The proposed semi-parametric approach is also applied to assess the semi-continuous BRCA1 biomarker as a valid prognostic biomarker for predicting cancer progression at 4 years and identifying a cut-off point to classify patients with advanced ovarian cancer into two groups with good and bad prognoses.
接受者工作特征(ROC)曲线及其汇总指标,如曲线下面积(AUC)和约登指数,经常用于评估基于连续生物标志物数据的二元分类器的性能,同时确定合适的分类截止点。在临床应用中,用于分类的生物标志物可能是半连续的,这意味着观察值包含多余的零值,并且正值的分布是倾斜的。在本文中,一个半连续的生物标志物的分布是使用一个离散的质量在零和一个连续偏斜的正分量的混合物建模。此外,连续分量在真阴性和真阳性结果受试者中的分布通过半参数密度比模型联系起来,以提高效率。在此框架下,提出了统一的ROC曲线、重要的汇总测度和相关的截止点的估计和推断程序。建立了所提半参数估计量的渐近性质,并利用其构造了相应的置信区间。仿真结果表明,这些估计器和置信区间在各种设置下都具有良好的性能。提出的半参数方法也被用于评估半连续BRCA1生物标志物作为预测4年癌症进展的有效预后生物标志物,并确定将晚期卵巢癌患者分为预后良好和预后不良两组的截断点。
{"title":"A semi-parametric approach to receiver operating characteristic analysis with semi-continuous biomarker","authors":"Baohao Wei ,&nbsp;Dongsheng Tu ,&nbsp;Chunlin Wang","doi":"10.1016/j.csda.2025.108305","DOIUrl":"10.1016/j.csda.2025.108305","url":null,"abstract":"<div><div>The receiver operating characteristic (ROC) curve and its summary measures, such as the area under the curve (AUC) and Youden index, are frequently used to evaluate the performance of a binary classifier based on data of a continuous biomarker and meanwhile identify a suitable cut-off point for classification. In clinical applications, the biomarker used for classification may be semi-continuous in the sense that the observations contain excess zero values and the distribution of the positive values is skewed. In this paper, the distribution of a semi-continuous biomarker is modeled using a mixture of a discrete mass at zero and a continuous skewed positive component. In addition, the distributions of the continuous component in subjects with true negative and positive outcomes are linked by a semi-parametric density ratio model to gain efficiency. Under this framework, unified estimation and inference procedures are proposed for the ROC curve, its important summary measures, and the associated cut-off point. The asymptotic properties of the proposed semi-parametric estimators are established and used to construct their corresponding confidence intervals. Simulation results demonstrate the desirable performance of these estimators and confidence intervals in various settings. The proposed semi-parametric approach is also applied to assess the semi-continuous BRCA1 biomarker as a valid prognostic biomarker for predicting cancer progression at 4 years and identifying a cut-off point to classify patients with advanced ovarian cancer into two groups with good and bad prognoses.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108305"},"PeriodicalIF":1.6,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hypothesis test in high dimensional multi-response linear models 高维多响应线性模型的假设检验
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-10 DOI: 10.1016/j.csda.2025.108303
Yuan Ke , Rongmao Zhang , Wenyang Zhang , Changliang Zou
Data with multiple responses is very common in economics, engineering, finance, and social science. Analyzing each response variable separately may not be a good strategy as this approach can overlook important information and lead to suboptimal results. In some cases, it may not even provide an answer to the question of interest. Multi-response linear models serve as an important tool for joint analysis. While the methodology and theory of classic multi-response linear models are well-established, they may not be applicable to high-dimensional cases. In this paper, we propose a powerful hypothesis test for the coefficient matrix of a high-dimensional multi-response linear model. We establish asymptotic results and conduct comprehensive simulation studies to demonstrate that the proposed hypothesis test is more powerful than alternative methods. Furthermore, we apply the hypothesis test to two real datasets, illustrating its usefulness in addressing practical problems.
具有多个响应的数据在经济学、工程学、金融学和社会科学中非常常见。单独分析每个响应变量可能不是一个好策略,因为这种方法可能会忽略重要信息并导致次优结果。在某些情况下,它甚至可能无法提供兴趣问题的答案。多响应线性模型是进行联合分析的重要工具。虽然经典的多响应线性模型的方法和理论是完善的,但它们可能不适用于高维情况。本文对高维多响应线性模型的系数矩阵提出了一个强有力的假设检验。我们建立了渐近结果,并进行了全面的模拟研究,以证明所提出的假设检验比其他方法更强大。此外,我们将假设检验应用于两个真实数据集,说明了它在解决实际问题方面的有用性。
{"title":"Hypothesis test in high dimensional multi-response linear models","authors":"Yuan Ke ,&nbsp;Rongmao Zhang ,&nbsp;Wenyang Zhang ,&nbsp;Changliang Zou","doi":"10.1016/j.csda.2025.108303","DOIUrl":"10.1016/j.csda.2025.108303","url":null,"abstract":"<div><div>Data with multiple responses is very common in economics, engineering, finance, and social science. Analyzing each response variable separately may not be a good strategy as this approach can overlook important information and lead to suboptimal results. In some cases, it may not even provide an answer to the question of interest. Multi-response linear models serve as an important tool for joint analysis. While the methodology and theory of classic multi-response linear models are well-established, they may not be applicable to high-dimensional cases. In this paper, we propose a powerful hypothesis test for the coefficient matrix of a high-dimensional multi-response linear model. We establish asymptotic results and conduct comprehensive simulation studies to demonstrate that the proposed hypothesis test is more powerful than alternative methods. Furthermore, we apply the hypothesis test to two real datasets, illustrating its usefulness in addressing practical problems.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108303"},"PeriodicalIF":1.6,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modelling catastrophic extinction in stochastic birth-death process: Analytical insights, estimation, and efficient simulation 随机生-死过程中的灾难性灭绝建模:分析见解、估计和有效模拟
IF 1.6 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-07 DOI: 10.1016/j.csda.2025.108302
Clement Twumasi
A comprehensive analytical and computational framework is developed for the linear birth-death process (LBDP) with catastrophic extinction (BDC process), a continuous-time Markov model that incorporates sudden extinction events into the classical LBDP. Despite its conceptual simplicity, the underlying BDC process poses substantial challenges in deriving exact transition probabilities and performing reliable parameter estimation, particularly under discrete-time observations. While previous work established foundational properties using spectral methods and probability generating functions (PGFs), explicit analytical expressions for transition probabilities and theoretical moments have remained unavailable, limiting practical applications in extinction-prone systems. This limitation is addressed by reparameterising the PGF through functional restructuring, yielding exact closed-form expressions for the transition probability function and the theoretical moments of the discretely observed BDC process, with results validated through comprehensive numerical experiments for the first time. Three parameter estimation approaches tailored to the BDC process are introduced and evaluated: maximum likelihood estimation (MLE), generalised method of moments (GMM), and an embedded Galton-Watson (GW) approach, with trade-offs between computational efficiency and estimation accuracy examined across diverse simulation scenarios. To improve scalability, a Monte Carlo simulation framework based on a hybrid tau-leaping algorithm is formulated, specifically adapted to extinction-driven dynamics, offering a computationally efficient alternative to the exact stochastic simulation algorithm (SSA). The proposed methodologies offer a tractable and scalable foundation for incorporating the BDC process into applied stochastic models, particularly in ecological, epidemiological, and biological systems where populations are susceptible to sudden collapse due to catastrophic events such as host mortality or immune response.
摘要针对线性生-死过程(LBDP)与灾难性灭绝(BDC)过程,建立了一个综合的分析和计算框架,这是一种将突然灭绝事件纳入经典LBDP的连续时间马尔可夫模型。尽管其概念简单,但潜在的BDC过程在推导精确的转移概率和执行可靠的参数估计方面提出了重大挑战,特别是在离散时间观测下。虽然以前的工作使用谱方法和概率生成函数(PGFs)建立了基本性质,但过渡概率和理论矩的明确解析表达式仍然不可用,限制了在易灭绝系统中的实际应用。通过功能重构对PGF进行重新参数化,得到了转移概率函数和离散观测BDC过程的理论矩的精确封闭表达式,并首次通过综合数值实验验证了结果。介绍并评估了针对BDC过程量身定制的三种参数估计方法:最大似然估计(MLE)、广义矩量法(GMM)和嵌入式高尔顿-沃森(GW)方法,并在不同的模拟场景中检查了计算效率和估计精度之间的权衡。为了提高可扩展性,制定了基于混合tau跳跃算法的蒙特卡罗模拟框架,特别适用于灭绝驱动的动力学,提供了精确随机模拟算法(SSA)的计算效率替代方案。所提出的方法为将BDC过程纳入应用随机模型提供了一个易于处理和可扩展的基础,特别是在生态,流行病学和生物系统中,种群容易因灾难性事件(如宿主死亡或免疫反应)而突然崩溃。
{"title":"Modelling catastrophic extinction in stochastic birth-death process: Analytical insights, estimation, and efficient simulation","authors":"Clement Twumasi","doi":"10.1016/j.csda.2025.108302","DOIUrl":"10.1016/j.csda.2025.108302","url":null,"abstract":"<div><div>A comprehensive analytical and computational framework is developed for the linear birth-death process (LBDP) with catastrophic extinction (BDC process), a continuous-time Markov model that incorporates sudden extinction events into the classical LBDP. Despite its conceptual simplicity, the underlying BDC process poses substantial challenges in deriving exact transition probabilities and performing reliable parameter estimation, particularly under discrete-time observations. While previous work established foundational properties using spectral methods and probability generating functions (PGFs), explicit analytical expressions for transition probabilities and theoretical moments have remained unavailable, limiting practical applications in extinction-prone systems. This limitation is addressed by reparameterising the PGF through functional restructuring, yielding exact closed-form expressions for the transition probability function and the theoretical moments of the discretely observed BDC process, with results validated through comprehensive numerical experiments for the first time. Three parameter estimation approaches tailored to the BDC process are introduced and evaluated: maximum likelihood estimation (MLE), generalised method of moments (GMM), and an embedded Galton-Watson (GW) approach, with trade-offs between computational efficiency and estimation accuracy examined across diverse simulation scenarios. To improve scalability, a Monte Carlo simulation framework based on a hybrid tau-leaping algorithm is formulated, specifically adapted to extinction-driven dynamics, offering a computationally efficient alternative to the exact stochastic simulation algorithm (SSA). The proposed methodologies offer a tractable and scalable foundation for incorporating the BDC process into applied stochastic models, particularly in ecological, epidemiological, and biological systems where populations are susceptible to sudden collapse due to catastrophic events such as host mortality or immune response.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108302"},"PeriodicalIF":1.6,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1