Pub Date : 2025-12-25DOI: 10.1016/j.csda.2025.108334
Houlin Zhou , Hanbing Zhu , Xuejun Wang
This article addresses the problem of detecting structural changes in multivariate nonparametric regression models, which commonly arise in high-dimensional and time-dependent data analysis. We propose a CUSUM-type test statistic constructed from estimators obtained via deep neural networks (DNNs). The theoretical properties of the proposed test statistic are rigorously derived under the null and alternative hypotheses. Under the assumptions of a low-dimensional manifold structure in the data support and a hierarchical model architecture, we demonstrate that the DNN-based change-point detection method can effectively mitigate the curse of dimensionality. Furthermore, we establish the asymptotic properties and derive the convergence rate of the estimator for the change-point location. Extensive comparative simulation studies confirm the effectiveness and superior performance of the proposed approach. Finally, we illustrate the practical applicability of the method through an empirical analysis using real-world regional electricity consumption data.
{"title":"Change-point detection for multivariate nonparametric regression with deep neural networks","authors":"Houlin Zhou , Hanbing Zhu , Xuejun Wang","doi":"10.1016/j.csda.2025.108334","DOIUrl":"10.1016/j.csda.2025.108334","url":null,"abstract":"<div><div>This article addresses the problem of detecting structural changes in multivariate nonparametric regression models, which commonly arise in high-dimensional and time-dependent data analysis. We propose a CUSUM-type test statistic constructed from estimators obtained via deep neural networks (DNNs). The theoretical properties of the proposed test statistic are rigorously derived under the null and alternative hypotheses. Under the assumptions of a low-dimensional manifold structure in the data support and a hierarchical model architecture, we demonstrate that the DNN-based change-point detection method can effectively mitigate the curse of dimensionality. Furthermore, we establish the asymptotic properties and derive the convergence rate of the estimator for the change-point location. Extensive comparative simulation studies confirm the effectiveness and superior performance of the proposed approach. Finally, we illustrate the practical applicability of the method through an empirical analysis using real-world regional electricity consumption data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108334"},"PeriodicalIF":1.6,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145908905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.1016/j.csda.2025.108335
Isaac Diaz-Ray , Huiyan Sang , Guanyu Hu , Ligang Lu
Density or intensity function estimation for point pattern data observed on complex domains finds wide applications in spatial data analysis. However, many existing popular density estimation methods face challenges when domains have irregular boundaries, line network structures, sharp concavities, or interior holes. A nonparametric Bayesian additive ensemble of spanning trees model is developed to model the distribution of event occurrences on complex domains. This model uses a random spanning tree weak learner, which can produce flexible and contiguous domain partitions while respecting its geometry and constraints. The method has the advantage of capturing both varying smoothness and sharp changes in density functions. An efficient exact likelihood-based Bayesian inference algorithm is proposed to estimate the density function with uncertainty measures, leveraging a data thinning strategy combined with Poisson-Gamma conjugacy. Simulation studies on various complex domains demonstrate the advantages of the proposed model over competing methods. The method is further applied to the analysis of basketball shot data and crime locations on a road network.
{"title":"Nonparametric density estimation on complex domains using manifold-aware Bayesian additive tree models","authors":"Isaac Diaz-Ray , Huiyan Sang , Guanyu Hu , Ligang Lu","doi":"10.1016/j.csda.2025.108335","DOIUrl":"10.1016/j.csda.2025.108335","url":null,"abstract":"<div><div>Density or intensity function estimation for point pattern data observed on complex domains finds wide applications in spatial data analysis. However, many existing popular density estimation methods face challenges when domains have irregular boundaries, line network structures, sharp concavities, or interior holes. A nonparametric Bayesian additive ensemble of spanning trees model is developed to model the distribution of event occurrences on complex domains. This model uses a random spanning tree weak learner, which can produce flexible and contiguous domain partitions while respecting its geometry and constraints. The method has the advantage of capturing both varying smoothness and sharp changes in density functions. An efficient exact likelihood-based Bayesian inference algorithm is proposed to estimate the density function with uncertainty measures, leveraging a data thinning strategy combined with Poisson-Gamma conjugacy. Simulation studies on various complex domains demonstrate the advantages of the proposed model over competing methods. The method is further applied to the analysis of basketball shot data and crime locations on a road network.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108335"},"PeriodicalIF":1.6,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1016/j.csda.2025.108321
Xuerong Meggie Wen , Yuexiao Dong , Li-Xing Zhu
A novel dimension-reduction method is introduced for multi-population data. The approach conducts a joint analysis that exploits information shared across populations while accommodating population-specific effects. Unlike partial dimension reduction methods, which identify related directions across all populations, or conditional analyses conducted independently within each population, the proposed two-step procedure leverages cross-population information to enhance estimation accuracy. The methodology is demonstrated through simulations and two real-data applications.
{"title":"Multi-population sufficient dimension reduction","authors":"Xuerong Meggie Wen , Yuexiao Dong , Li-Xing Zhu","doi":"10.1016/j.csda.2025.108321","DOIUrl":"10.1016/j.csda.2025.108321","url":null,"abstract":"<div><div>A novel dimension-reduction method is introduced for multi-population data. The approach conducts a joint analysis that exploits information shared across populations while accommodating population-specific effects. Unlike partial dimension reduction methods, which identify related directions across all populations, or conditional analyses conducted independently within each population, the proposed two-step procedure leverages cross-population information to enhance estimation accuracy. The methodology is demonstrated through simulations and two real-data applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108321"},"PeriodicalIF":1.6,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A general class of seasonal autoregressive integrated moving average models (SARIMA), whose period is an independent and identically distributed random process valued in a finite set, is proposed. This class of models is named random period seasonal ARIMA (SARIMAR). Attention is focused on three subsets of them: the random period seasonal autoregressive (SARR) models, the random period seasonal moving average (SMAR) models and the random period seasonal autoregressive moving average (SARMAR) models. First, the causality, invertibility, and autocovariance shape of these models are revealed. Then, the estimation of the model components (coefficients, innovation variance, probability distribution of the period, (unobserved) sample-path of the random period) is carried out using the Expectation-Maximization algorithm. In addition, a procedure for random elimination of seasonality is developed. A simulation study is conducted to assess the estimation accuracy of the proposed algorithmic scheme. Finally, the usefulness of the proposed methodology is illustrated with two applications about the annual Wolf sunspot numbers and the Canadian lynx data.
{"title":"Seasonal ARIMA models with a random period","authors":"Abdelhakim Aknouche , Stefanos Dimitrakopoulos , Nadia Rabehi","doi":"10.1016/j.csda.2025.108320","DOIUrl":"10.1016/j.csda.2025.108320","url":null,"abstract":"<div><div>A general class of seasonal autoregressive integrated moving average models (SARIMA), whose period is an independent and identically distributed random process valued in a finite set, is proposed. This class of models is named random period seasonal ARIMA (SARIMAR). Attention is focused on three subsets of them: the random period seasonal autoregressive (SARR) models, the random period seasonal moving average (SMAR) models and the random period seasonal autoregressive moving average (SARMAR) models. First, the causality, invertibility, and autocovariance shape of these models are revealed. Then, the estimation of the model components (coefficients, innovation variance, probability distribution of the period, (unobserved) sample-path of the random period) is carried out using the Expectation-Maximization algorithm. In addition, a procedure for random elimination of seasonality is developed. A simulation study is conducted to assess the estimation accuracy of the proposed algorithmic scheme. Finally, the usefulness of the proposed methodology is illustrated with two applications about the annual Wolf sunspot numbers and the Canadian lynx data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108320"},"PeriodicalIF":1.6,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145798821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1016/j.csda.2025.108319
Jiarong Ding , Yanmei Shi , Niwen Zhou , Mei Yao , Xu Guo
Testing the significance of a subset of covariates for a response is a critical problem with broad applications. A novel nonparametric significance testing procedure is developed to test whether a set of target covariates provides incremental information about the conditional quantile of the response given the other covariates. The proposed test statistics are constructed within the framework of debiased machine learning, which enables flexible estimation of unknown functions by leveraging machine learning methods. The asymptotic properties of the proposed test statistic under the null hypothesis are established, and the power under the alternatives is analyzed, demonstrating the ability of the procedure to detect local alternatives at the optimal parametric rate. To further enhance power, an ensemble quantile significance testing procedure is introduced. Extensive numerical studies and real data applications are conducted to illustrate the finite-sample performance of the proposed testing procedures.
{"title":"Debiased quantile significance testing with machine learning methods","authors":"Jiarong Ding , Yanmei Shi , Niwen Zhou , Mei Yao , Xu Guo","doi":"10.1016/j.csda.2025.108319","DOIUrl":"10.1016/j.csda.2025.108319","url":null,"abstract":"<div><div>Testing the significance of a subset of covariates for a response is a critical problem with broad applications. A novel nonparametric significance testing procedure is developed to test whether a set of target covariates provides incremental information about the conditional quantile of the response given the other covariates. The proposed test statistics are constructed within the framework of debiased machine learning, which enables flexible estimation of unknown functions by leveraging machine learning methods. The asymptotic properties of the proposed test statistic under the null hypothesis are established, and the power under the alternatives is analyzed, demonstrating the ability of the procedure to detect local alternatives at the optimal parametric rate. To further enhance power, an ensemble quantile significance testing procedure is introduced. Extensive numerical studies and real data applications are conducted to illustrate the finite-sample performance of the proposed testing procedures.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108319"},"PeriodicalIF":1.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145712512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-30DOI: 10.1016/j.csda.2025.108309
Marie Michaelides , Hélène Cossette , Mathieu Pigeon
A novel framework is introduced for estimating Archimedean copula generators in a conditional setting by embedding endogenous variables directly within the generator function. Unlike standard copula constructions that rely on a fixed dependence structure across all covariate levels, the proposed methodology allows both the strength and the shape of dependence to evolve with the covariates. To identify the values of a continuous risk factor at which the dependence pattern undergoes substantive changes, an iterative splitting algorithm is developed to determine optimal partitioning points within the covariate range. The approach is evaluated through applications to a diabetic retinopathy study and a claims reserving analysis, illustrating that explicitly modelling covariate effects yields a more accurate representation of dependence and enhances the practical relevance of copula models in medical and actuarial settings.
{"title":"Parametric estimation of conditional archimedean copula generators for censored data","authors":"Marie Michaelides , Hélène Cossette , Mathieu Pigeon","doi":"10.1016/j.csda.2025.108309","DOIUrl":"10.1016/j.csda.2025.108309","url":null,"abstract":"<div><div>A novel framework is introduced for estimating Archimedean copula generators in a conditional setting by embedding endogenous variables directly within the generator function. Unlike standard copula constructions that rely on a fixed dependence structure across all covariate levels, the proposed methodology allows both the strength and the shape of dependence to evolve with the covariates. To identify the values of a continuous risk factor at which the dependence pattern undergoes substantive changes, an iterative splitting algorithm is developed to determine optimal partitioning points within the covariate range. The approach is evaluated through applications to a diabetic retinopathy study and a claims reserving analysis, illustrating that explicitly modelling covariate effects yields a more accurate representation of dependence and enhances the practical relevance of copula models in medical and actuarial settings.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108309"},"PeriodicalIF":1.6,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1016/j.csda.2025.108304
Yue Huan , Guoqiang Wang , Hai Xiang Lin
Data assimilation (DA) combines numerical model simulations with observed data to obtain the best possible description of a dynamical system and its uncertainty. Incorrect modeling assumptions can lead to filter divergence, making model identification an important issue in the field of DA. Variations in dynamic model structures can result in differences in parameter dimensions, complicating the resampling step in PFs. To meet this challenge, the Sequential Hierarchical Bayesian Model (SHBM) is proposed in this paper, which integrates the evolution model along with observation model from the DA scheme, and the hierarchical parameter model. A two-step resampling method are also proposed to estimate the SHBM: the first step uses the resampling scheme in the bootstrap filter to resample new particles based on weights, which may produce some duplicate particles; the second step utilizes the Reversible Jump Markov Chain Monte Carlo (RJMCMC) methods to draw new particles from the target distribution. This approach ensures particle diversity, with the first step aiming at avoiding particle degeneracy, and the second step intends to prevent the sample impoverishment. The performance in the Advection Equation example and Lorenz 96 example demonstrates the effectiveness of the proposed method.
{"title":"Sequential hierarchical Bayesian model and particle filter estimation with two-step RJMCMC resampling","authors":"Yue Huan , Guoqiang Wang , Hai Xiang Lin","doi":"10.1016/j.csda.2025.108304","DOIUrl":"10.1016/j.csda.2025.108304","url":null,"abstract":"<div><div>Data assimilation (DA) combines numerical model simulations with observed data to obtain the best possible description of a dynamical system and its uncertainty. Incorrect modeling assumptions can lead to filter divergence, making model identification an important issue in the field of DA. Variations in dynamic model structures can result in differences in parameter dimensions, complicating the resampling step in PFs. To meet this challenge, the Sequential Hierarchical Bayesian Model (SHBM) is proposed in this paper, which integrates the evolution model along with observation model from the DA scheme, and the hierarchical parameter model. A two-step resampling method are also proposed to estimate the SHBM: the first step uses the resampling scheme in the bootstrap filter to resample new particles based on weights, which may produce some duplicate particles; the second step utilizes the Reversible Jump Markov Chain Monte Carlo (RJMCMC) methods to draw new particles from the target distribution. This approach ensures particle diversity, with the first step aiming at avoiding particle degeneracy, and the second step intends to prevent the sample impoverishment. The performance in the Advection Equation example and Lorenz 96 example demonstrates the effectiveness of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108304"},"PeriodicalIF":1.6,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-15DOI: 10.1016/j.csda.2025.108305
Baohao Wei , Dongsheng Tu , Chunlin Wang
The receiver operating characteristic (ROC) curve and its summary measures, such as the area under the curve (AUC) and Youden index, are frequently used to evaluate the performance of a binary classifier based on data of a continuous biomarker and meanwhile identify a suitable cut-off point for classification. In clinical applications, the biomarker used for classification may be semi-continuous in the sense that the observations contain excess zero values and the distribution of the positive values is skewed. In this paper, the distribution of a semi-continuous biomarker is modeled using a mixture of a discrete mass at zero and a continuous skewed positive component. In addition, the distributions of the continuous component in subjects with true negative and positive outcomes are linked by a semi-parametric density ratio model to gain efficiency. Under this framework, unified estimation and inference procedures are proposed for the ROC curve, its important summary measures, and the associated cut-off point. The asymptotic properties of the proposed semi-parametric estimators are established and used to construct their corresponding confidence intervals. Simulation results demonstrate the desirable performance of these estimators and confidence intervals in various settings. The proposed semi-parametric approach is also applied to assess the semi-continuous BRCA1 biomarker as a valid prognostic biomarker for predicting cancer progression at 4 years and identifying a cut-off point to classify patients with advanced ovarian cancer into two groups with good and bad prognoses.
{"title":"A semi-parametric approach to receiver operating characteristic analysis with semi-continuous biomarker","authors":"Baohao Wei , Dongsheng Tu , Chunlin Wang","doi":"10.1016/j.csda.2025.108305","DOIUrl":"10.1016/j.csda.2025.108305","url":null,"abstract":"<div><div>The receiver operating characteristic (ROC) curve and its summary measures, such as the area under the curve (AUC) and Youden index, are frequently used to evaluate the performance of a binary classifier based on data of a continuous biomarker and meanwhile identify a suitable cut-off point for classification. In clinical applications, the biomarker used for classification may be semi-continuous in the sense that the observations contain excess zero values and the distribution of the positive values is skewed. In this paper, the distribution of a semi-continuous biomarker is modeled using a mixture of a discrete mass at zero and a continuous skewed positive component. In addition, the distributions of the continuous component in subjects with true negative and positive outcomes are linked by a semi-parametric density ratio model to gain efficiency. Under this framework, unified estimation and inference procedures are proposed for the ROC curve, its important summary measures, and the associated cut-off point. The asymptotic properties of the proposed semi-parametric estimators are established and used to construct their corresponding confidence intervals. Simulation results demonstrate the desirable performance of these estimators and confidence intervals in various settings. The proposed semi-parametric approach is also applied to assess the semi-continuous BRCA1 biomarker as a valid prognostic biomarker for predicting cancer progression at 4 years and identifying a cut-off point to classify patients with advanced ovarian cancer into two groups with good and bad prognoses.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108305"},"PeriodicalIF":1.6,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1016/j.csda.2025.108303
Yuan Ke , Rongmao Zhang , Wenyang Zhang , Changliang Zou
Data with multiple responses is very common in economics, engineering, finance, and social science. Analyzing each response variable separately may not be a good strategy as this approach can overlook important information and lead to suboptimal results. In some cases, it may not even provide an answer to the question of interest. Multi-response linear models serve as an important tool for joint analysis. While the methodology and theory of classic multi-response linear models are well-established, they may not be applicable to high-dimensional cases. In this paper, we propose a powerful hypothesis test for the coefficient matrix of a high-dimensional multi-response linear model. We establish asymptotic results and conduct comprehensive simulation studies to demonstrate that the proposed hypothesis test is more powerful than alternative methods. Furthermore, we apply the hypothesis test to two real datasets, illustrating its usefulness in addressing practical problems.
{"title":"Hypothesis test in high dimensional multi-response linear models","authors":"Yuan Ke , Rongmao Zhang , Wenyang Zhang , Changliang Zou","doi":"10.1016/j.csda.2025.108303","DOIUrl":"10.1016/j.csda.2025.108303","url":null,"abstract":"<div><div>Data with multiple responses is very common in economics, engineering, finance, and social science. Analyzing each response variable separately may not be a good strategy as this approach can overlook important information and lead to suboptimal results. In some cases, it may not even provide an answer to the question of interest. Multi-response linear models serve as an important tool for joint analysis. While the methodology and theory of classic multi-response linear models are well-established, they may not be applicable to high-dimensional cases. In this paper, we propose a powerful hypothesis test for the coefficient matrix of a high-dimensional multi-response linear model. We establish asymptotic results and conduct comprehensive simulation studies to demonstrate that the proposed hypothesis test is more powerful than alternative methods. Furthermore, we apply the hypothesis test to two real datasets, illustrating its usefulness in addressing practical problems.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108303"},"PeriodicalIF":1.6,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-07DOI: 10.1016/j.csda.2025.108302
Clement Twumasi
A comprehensive analytical and computational framework is developed for the linear birth-death process (LBDP) with catastrophic extinction (BDC process), a continuous-time Markov model that incorporates sudden extinction events into the classical LBDP. Despite its conceptual simplicity, the underlying BDC process poses substantial challenges in deriving exact transition probabilities and performing reliable parameter estimation, particularly under discrete-time observations. While previous work established foundational properties using spectral methods and probability generating functions (PGFs), explicit analytical expressions for transition probabilities and theoretical moments have remained unavailable, limiting practical applications in extinction-prone systems. This limitation is addressed by reparameterising the PGF through functional restructuring, yielding exact closed-form expressions for the transition probability function and the theoretical moments of the discretely observed BDC process, with results validated through comprehensive numerical experiments for the first time. Three parameter estimation approaches tailored to the BDC process are introduced and evaluated: maximum likelihood estimation (MLE), generalised method of moments (GMM), and an embedded Galton-Watson (GW) approach, with trade-offs between computational efficiency and estimation accuracy examined across diverse simulation scenarios. To improve scalability, a Monte Carlo simulation framework based on a hybrid tau-leaping algorithm is formulated, specifically adapted to extinction-driven dynamics, offering a computationally efficient alternative to the exact stochastic simulation algorithm (SSA). The proposed methodologies offer a tractable and scalable foundation for incorporating the BDC process into applied stochastic models, particularly in ecological, epidemiological, and biological systems where populations are susceptible to sudden collapse due to catastrophic events such as host mortality or immune response.
{"title":"Modelling catastrophic extinction in stochastic birth-death process: Analytical insights, estimation, and efficient simulation","authors":"Clement Twumasi","doi":"10.1016/j.csda.2025.108302","DOIUrl":"10.1016/j.csda.2025.108302","url":null,"abstract":"<div><div>A comprehensive analytical and computational framework is developed for the linear birth-death process (LBDP) with catastrophic extinction (BDC process), a continuous-time Markov model that incorporates sudden extinction events into the classical LBDP. Despite its conceptual simplicity, the underlying BDC process poses substantial challenges in deriving exact transition probabilities and performing reliable parameter estimation, particularly under discrete-time observations. While previous work established foundational properties using spectral methods and probability generating functions (PGFs), explicit analytical expressions for transition probabilities and theoretical moments have remained unavailable, limiting practical applications in extinction-prone systems. This limitation is addressed by reparameterising the PGF through functional restructuring, yielding exact closed-form expressions for the transition probability function and the theoretical moments of the discretely observed BDC process, with results validated through comprehensive numerical experiments for the first time. Three parameter estimation approaches tailored to the BDC process are introduced and evaluated: maximum likelihood estimation (MLE), generalised method of moments (GMM), and an embedded Galton-Watson (GW) approach, with trade-offs between computational efficiency and estimation accuracy examined across diverse simulation scenarios. To improve scalability, a Monte Carlo simulation framework based on a hybrid tau-leaping algorithm is formulated, specifically adapted to extinction-driven dynamics, offering a computationally efficient alternative to the exact stochastic simulation algorithm (SSA). The proposed methodologies offer a tractable and scalable foundation for incorporating the BDC process into applied stochastic models, particularly in ecological, epidemiological, and biological systems where populations are susceptible to sudden collapse due to catastrophic events such as host mortality or immune response.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108302"},"PeriodicalIF":1.6,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}