Pub Date : 2024-02-15DOI: 10.1007/s00180-024-01462-9
Xing Liu, Weihua Deng
This paper discusses the first exit and Dirichlet problems of the nonisotropic tempered (alpha)-stable process (X_t). The upper bounds of all moments of the first exit position (left| X_{tau _D}right|) and the first exit time (tau _D) are explicitly obtained. It is found that the probability density function of (left| X_{tau _D}right|) or (tau _D) exponentially decays with the increase of (left| X_{tau _D}right|) or (tau _D), and (mathrm{E}left[ tau _Dright] sim mathrm{E}left[ left| X_{tau _D}-mathrm{E}left[ X_{tau _D}right] right| ^2right]), (mathrm{E}left[ tau _Dright] sim left| mathrm{E}left[ X_{tau _D}right] right|). Next, we obtain the Feynman–Kac representation of the Dirichlet problem by employing the semigroup theory. Furthermore, averaging the generated trajectories of the stochastic process leads to the solution of the Dirichlet problem, which is also verified by numerical experiments.
{"title":"First exit and Dirichlet problem for the nonisotropic tempered $$alpha$$ -stable processes","authors":"Xing Liu, Weihua Deng","doi":"10.1007/s00180-024-01462-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01462-9","url":null,"abstract":"<p>This paper discusses the first exit and Dirichlet problems of the nonisotropic tempered <span>(alpha)</span>-stable process <span>(X_t)</span>. The upper bounds of all moments of the first exit position <span>(left| X_{tau _D}right|)</span> and the first exit time <span>(tau _D)</span> are explicitly obtained. It is found that the probability density function of <span>(left| X_{tau _D}right|)</span> or <span>(tau _D)</span> exponentially decays with the increase of <span>(left| X_{tau _D}right|)</span> or <span>(tau _D)</span>, and <span>(mathrm{E}left[ tau _Dright] sim mathrm{E}left[ left| X_{tau _D}-mathrm{E}left[ X_{tau _D}right] right| ^2right])</span>, <span>(mathrm{E}left[ tau _Dright] sim left| mathrm{E}left[ X_{tau _D}right] right|)</span>. Next, we obtain the Feynman–Kac representation of the Dirichlet problem by employing the semigroup theory. Furthermore, averaging the generated trajectories of the stochastic process leads to the solution of the Dirichlet problem, which is also verified by numerical experiments.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"23 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-13DOI: 10.1007/s00180-024-01463-8
Wolfgang Kössler, Hans-J. Lenz, Xing D. Wang
The Benford law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from the universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit is One with an approximate probability of 0.3. There are several tests available for testing Benford, the best known are Pearson’s (chi ^2)-test, the Kolmogorov–Smirnov test and a modified version of the MAD-test. In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent of the sample sizes. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.
{"title":"Some new invariant sum tests and MAD tests for the assessment of Benford’s law","authors":"Wolfgang Kössler, Hans-J. Lenz, Xing D. Wang","doi":"10.1007/s00180-024-01463-8","DOIUrl":"https://doi.org/10.1007/s00180-024-01463-8","url":null,"abstract":"<p>The Benford law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from the universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit is One with an approximate probability of 0.3. There are several tests available for testing Benford, the best known are Pearson’s <span>(chi ^2)</span>-test, the Kolmogorov–Smirnov test and a modified version of the MAD-test. In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent of the sample sizes. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"170 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-12DOI: 10.1007/s00180-024-01465-6
Yi Wu, Wei Wang, Xuejun Wang
Let ({X_{i},1le ile n}) be a sequence of linear process based on dependent random variables with random coefficients, which has a mean shift at an unknown location. The cumulative sum (CUSUM, for short) estimator of the change point is studied. The strong convergence, (L_{r}) convergence, complete convergence and the rate of strong convergence are established for the CUSUM estimator under some mild conditions. These results improve and extend the corresponding ones in the literature. Simulation studies and two real data examples are also provided to support the theoretical results.
设({X_{i},1le ile n} )是一个基于因变量的线性过程序列,具有随机系数,在未知位置有均值移动。研究了变化点的累积和(简称 CUSUM)估计器。在一些温和的条件下,建立了 CUSUM 估计器的强收敛性、(L_{r})收敛性、完全收敛性和强收敛率。这些结果改进并扩展了文献中的相应结果。此外,还提供了仿真研究和两个真实数据实例来支持理论结果。
{"title":"Convergence of the CUSUM estimation for a mean shift in linear processes with random coefficients","authors":"Yi Wu, Wei Wang, Xuejun Wang","doi":"10.1007/s00180-024-01465-6","DOIUrl":"https://doi.org/10.1007/s00180-024-01465-6","url":null,"abstract":"<p>Let <span>({X_{i},1le ile n})</span> be a sequence of linear process based on dependent random variables with random coefficients, which has a mean shift at an unknown location. The cumulative sum (CUSUM, for short) estimator of the change point is studied. The strong convergence, <span>(L_{r})</span> convergence, complete convergence and the rate of strong convergence are established for the CUSUM estimator under some mild conditions. These results improve and extend the corresponding ones in the literature. Simulation studies and two real data examples are also provided to support the theoretical results.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-10DOI: 10.1007/s00180-023-01447-0
Abstract
Semi-supervised learning approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan (Stat Comput 30:1–12, 2020). We show that in a partially classified sample, a classifier using Bayes’ rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.
{"title":"Analysis of estimating the Bayes rule for Gaussian mixture models with a specified missing-data mechanism","authors":"","doi":"10.1007/s00180-023-01447-0","DOIUrl":"https://doi.org/10.1007/s00180-023-01447-0","url":null,"abstract":"<h3>Abstract</h3> <p>Semi-supervised learning approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan (Stat Comput 30:1–12, 2020). We show that in a partially classified sample, a classifier using Bayes’ rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"212 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-10DOI: 10.1007/s00180-024-01459-4
Jiwon Park, Dipak K. Dey, Víctor H. Lachos
Finite mixture models have been widely used to model and analyze data from heterogeneous populations. In practical scenarios, these types of data often confront upper and/or lower detection limits due to the constraints imposed by experimental apparatuses. Additional complexity arises when measures of each mixture component significantly deviate from the normal distribution, manifesting characteristics such as multimodality, asymmetry, and heavy-tailed behavior, simultaneously. This paper introduces a flexible model tailored for censored data to address these intricacies, leveraging the finite mixture of skew-t distributions. An Expectation Conditional Maximization Either (ECME) algorithm, is developed to efficiently derive parameter estimates by iteratively maximizing the observed data log-likelihood function. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of truncated skew-t distributions. Moreover, a method based on general information principles is presented for approximating the asymptotic covariance matrix of the estimators. Results obtained from the analysis of both simulated and real datasets demonstrate the proposed method’s effectiveness.
有限混合物模型已被广泛用于异质群体数据的建模和分析。在实际应用中,由于实验设备的限制,这些类型的数据往往面临检测上限和/或下限的问题。当每个混合物成分的测量值明显偏离正态分布,同时表现出多模态、不对称和重尾行为等特征时,就会产生额外的复杂性。本文利用倾斜-t 分布的有限混合物,介绍了一种为删减数据定制的灵活模型,以解决这些错综复杂的问题。本文开发了一种期望条件最大化算法(ECME),通过迭代最大化观测数据的对数似然函数,有效地得出参数估计。该算法在 E 步有闭式表达式,依赖于截断偏斜-t 分布的均值和方差公式。此外,还提出了一种基于一般信息原理的方法,用于逼近估计值的渐近协方差矩阵。对模拟数据集和真实数据集的分析结果证明了所提方法的有效性。
{"title":"Finite mixture of regression models for censored data based on the skew-t distribution","authors":"Jiwon Park, Dipak K. Dey, Víctor H. Lachos","doi":"10.1007/s00180-024-01459-4","DOIUrl":"https://doi.org/10.1007/s00180-024-01459-4","url":null,"abstract":"<p>Finite mixture models have been widely used to model and analyze data from heterogeneous populations. In practical scenarios, these types of data often confront upper and/or lower detection limits due to the constraints imposed by experimental apparatuses. Additional complexity arises when measures of each mixture component significantly deviate from the normal distribution, manifesting characteristics such as multimodality, asymmetry, and heavy-tailed behavior, simultaneously. This paper introduces a flexible model tailored for censored data to address these intricacies, leveraging the finite mixture of skew-<i>t</i> distributions. An Expectation Conditional Maximization Either (ECME) algorithm, is developed to efficiently derive parameter estimates by iteratively maximizing the observed data log-likelihood function. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of truncated skew-<i>t</i> distributions. Moreover, a method based on general information principles is presented for approximating the asymptotic covariance matrix of the estimators. Results obtained from the analysis of both simulated and real datasets demonstrate the proposed method’s effectiveness.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"38 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.1007/s00180-024-01456-7
Marco Antonio Montufar-Benítez, Jaime Mora-Vargas, Carlos Arturo Soto-Campos, Gilberto Pérez-Lechuga, José Raúl Castro-Esparza
The main goal in this study was to determine confidence intervals for average age, average seniority, and average money-savings, for faculty members in a university retirement system using a simulation model. The simulation—built-in Arena—considers age, seniority, and the probability of continuing in the institution as the main input random variables in the model. An annual interest rate of 7% and an average annual salary increase of 3% were considered. The scenario simulated consisted of the teacher and the university making contributions, the faculty 5% of his salary, and the university 5% of the teacher’s salary. Since the base salaries with which teachers join to university are variable, we considered a monthly salary of MXN 23 181.2, corresponding to full-time teachers with middle salaries. The results obtained by a simulation of 30 replicates showed that the confidence intervals for the average age at retirement were (55.0, 55.2) years, for the average seniority (22.1, 22.3) years, and for the average savings amount (329 795.2, 341 287.0) MXN. Moreover, the risk that a retiree of 62 years of age and more of 25 years of work, is alive after his savings runs out is approximately 98% and this happens at 64 years of age.
{"title":"A simulation model to analyze the behavior of a faculty retirement plan: a case study in Mexico","authors":"Marco Antonio Montufar-Benítez, Jaime Mora-Vargas, Carlos Arturo Soto-Campos, Gilberto Pérez-Lechuga, José Raúl Castro-Esparza","doi":"10.1007/s00180-024-01456-7","DOIUrl":"https://doi.org/10.1007/s00180-024-01456-7","url":null,"abstract":"<p>The main goal in this study was to determine confidence intervals for average age, average seniority, and average money-savings, for faculty members in a university retirement system using a simulation model. The simulation—built-in Arena—considers age, seniority, and the probability of continuing in the institution as the main input random variables in the model. An annual interest rate of 7% and an average annual salary increase of 3% were considered. The scenario simulated consisted of the teacher and the university making contributions, the faculty 5% of his salary, and the university 5% of the teacher’s salary. Since the base salaries with which teachers join to university are variable, we considered a monthly salary of MXN 23 181.2, corresponding to full-time teachers with middle salaries. The results obtained by a simulation of 30 replicates showed that the confidence intervals for the average age at retirement were (55.0, 55.2) years, for the average seniority (22.1, 22.3) years, and for the average savings amount (329 795.2, 341 287.0) MXN. Moreover, the risk that a retiree of 62 years of age and more of 25 years of work, is alive after his savings runs out is approximately 98% and this happens at 64 years of age.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"4 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.1007/s00180-024-01460-x
Abstract
Fitting concentric ellipses is a crucial yet challenging task in image processing, pattern recognition, and astronomy. To address this complexity, researchers have introduced simplified models by imposing geometric assumptions. These assumptions enable the linearization of the model through reparameterization, allowing for the extension of various fitting methods. However, these restrictive assumptions often fail to hold in real-world scenarios, limiting their practical applicability. In this work, we propose two novel estimators that relax these assumptions: the Least Squares method (LS) and the Gradient Algebraic Fit (GRAF). Since these methods are iterative, we provide numerical implementations and strategies for obtaining reliable initial guesses. Moreover, we employ perturbation theory to conduct a first-order analysis, deriving the leading terms of their Mean Squared Errors and their theoretical lower bounds. Our theoretical findings reveal that the GRAF is statistically efficient, while the LS method is not. We further validate our theoretical results and the performance of the proposed estimators through a series of numerical experiments on both real and synthetic data.
摘要 拟合同心椭圆是图像处理、模式识别和天文学中一项重要而又具有挑战性的任务。为了解决这一复杂问题,研究人员通过施加几何假设引入了简化模型。这些假设通过重新参数化使模型线性化,从而扩展了各种拟合方法。然而,这些限制性假设在现实世界中往往不成立,限制了它们的实际应用性。在这项工作中,我们提出了两种放宽这些假设的新型估计方法:最小二乘法(LS)和梯度代数拟合法(GRAF)。由于这些方法都是迭代法,我们提供了数值实现方法和策略,以获得可靠的初始猜测。此外,我们还利用扰动理论进行了一阶分析,得出了它们的均方误差前导项及其理论下限。我们的理论研究结果表明,GRAF 在统计上是高效的,而 LS 方法则不然。我们通过对真实数据和合成数据进行一系列数值实验,进一步验证了我们的理论结果和所提估计方法的性能。
{"title":"Fitting concentric elliptical shapes under general model","authors":"","doi":"10.1007/s00180-024-01460-x","DOIUrl":"https://doi.org/10.1007/s00180-024-01460-x","url":null,"abstract":"<h3>Abstract</h3> <p>Fitting concentric ellipses is a crucial yet challenging task in image processing, pattern recognition, and astronomy. To address this complexity, researchers have introduced simplified models by imposing geometric assumptions. These assumptions enable the linearization of the model through reparameterization, allowing for the extension of various fitting methods. However, these restrictive assumptions often fail to hold in real-world scenarios, limiting their practical applicability. In this work, we propose two novel estimators that relax these assumptions: the Least Squares method (LS) and the Gradient Algebraic Fit (GRAF). Since these methods are iterative, we provide numerical implementations and strategies for obtaining reliable initial guesses. Moreover, we employ perturbation theory to conduct a first-order analysis, deriving the leading terms of their Mean Squared Errors and their theoretical lower bounds. Our theoretical findings reveal that the GRAF is statistically efficient, while the LS method is not. We further validate our theoretical results and the performance of the proposed estimators through a series of numerical experiments on both real and synthetic data.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"40 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1007/s00180-023-01453-2
Nicholas Spyrison, Dianne Cook, Przemyslaw Biecek
The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to arrive at a prediction. These provide a point estimate of the linear variable importance in the vicinity of a single observation. However, LVAs tend not to effectively handle association between predictors. To understand how the interaction between predictors affects the variable importance estimate, we can convert LVAs into linear projections and use the radial tour. This is also useful for learning how a model has made a mistake, or the effect of outliers, or the clustering of observations. The approach is illustrated with examples from categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) response models. The methods are implemented in the R package cheem, available on CRAN.
机器学习模型预测能力的提高是以复杂性的增加和可解释性的丧失为代价的,尤其是与参数统计模型相比。这种权衡导致了可解释人工智能(XAI)的出现,它提供了一些方法,如局部解释(LE)和局部变量归因(LVA),以揭示模型是如何利用预测因子得出预测结果的。这些方法提供了对单个观测值附近线性变量重要性的点估计。然而,线性变量归因往往不能有效地处理预测因子之间的关联。为了了解预测因子之间的交互作用如何影响变量重要性估计值,我们可以将 LVA 转换为线性投影并使用径向游程。这对于了解模型如何出错、异常值的影响或观察结果的聚类也很有用。我们以分类(企鹅种类、巧克力类型)和定量(足球/橄榄球工资、房价)响应模型为例,对该方法进行了说明。这些方法在 CRAN 上提供的 R 软件包 cheem 中实现。
{"title":"Exploring local explanations of nonlinear models using animated linear projections","authors":"Nicholas Spyrison, Dianne Cook, Przemyslaw Biecek","doi":"10.1007/s00180-023-01453-2","DOIUrl":"https://doi.org/10.1007/s00180-023-01453-2","url":null,"abstract":"<p>The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to arrive at a prediction. These provide a point estimate of the linear variable importance in the vicinity of a single observation. However, LVAs tend not to effectively handle association between predictors. To understand how the interaction between predictors affects the variable importance estimate, we can convert LVAs into linear projections and use the radial tour. This is also useful for learning how a model has made a mistake, or the effect of outliers, or the clustering of observations. The approach is illustrated with examples from categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) response models. The methods are implemented in the R package cheem, available on CRAN.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"12 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139649042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1007/s00180-024-01455-8
Pavithra Hariharan, P. G. Sankaran
The current status censoring takes place in survival analysis when the exact event times are not known, but each individual is monitored once for their survival status. The current status data often arise in medical research, from situations that involve multiple causes of failure. Examining current status competing risks data, commonly encountered in epidemiological studies and clinical trials, is more advantageous with Bayesian methods compared to conventional approaches. They excel in integrating prior knowledge with the observed data and delivering accurate results even with small samples. Inspired by these advantages, the present study is pioneering in introducing a Bayesian framework for both modelling and analysis of current status competing risks data together with covariates. By means of the proportional hazards model, estimation procedures for the regression parameters and cumulative incidence functions are established assuming appropriate prior distributions. The posterior computation is performed using an adaptive Metropolis–Hastings algorithm. Methods for comparing and validating models have been devised. An assessment of the finite sample characteristics of the estimators is conducted through simulation studies. Through the application of this Bayesian approach to prostate cancer clinical trial data, its practical efficacy is demonstrated.
{"title":"Semiparametric regression modelling of current status competing risks data: a Bayesian approach","authors":"Pavithra Hariharan, P. G. Sankaran","doi":"10.1007/s00180-024-01455-8","DOIUrl":"https://doi.org/10.1007/s00180-024-01455-8","url":null,"abstract":"<p>The current status censoring takes place in survival analysis when the exact event times are not known, but each individual is monitored once for their survival status. The current status data often arise in medical research, from situations that involve multiple causes of failure. Examining current status competing risks data, commonly encountered in epidemiological studies and clinical trials, is more advantageous with Bayesian methods compared to conventional approaches. They excel in integrating prior knowledge with the observed data and delivering accurate results even with small samples. Inspired by these advantages, the present study is pioneering in introducing a Bayesian framework for both modelling and analysis of current status competing risks data together with covariates. By means of the proportional hazards model, estimation procedures for the regression parameters and cumulative incidence functions are established assuming appropriate prior distributions. The posterior computation is performed using an adaptive Metropolis–Hastings algorithm. Methods for comparing and validating models have been devised. An assessment of the finite sample characteristics of the estimators is conducted through simulation studies. Through the application of this Bayesian approach to prostate cancer clinical trial data, its practical efficacy is demonstrated.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"37 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139649048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-26DOI: 10.1007/s00180-024-01454-9
Kevin Rupp, Rudolf Schill, Jonas Süskind, Peter Georg, Maren Klever, Andreas Lösch, Lars Grasedyck, Tilo Wettig, Rainer Spang
We consider continuous-time Markov chains that describe the stochastic evolution of a dynamical system by a transition-rate matrix Q which depends on a parameter (theta ). Computing the probability distribution over states at time t requires the matrix exponential (exp ,left( tQright) ,), and inferring (theta ) from data requires its derivative (partial exp ,left( tQright) ,/partial theta ). Both are challenging to compute when the state space and hence the size of Q is huge. This can happen when the state space consists of all combinations of the values of several interacting discrete variables. Often it is even impossible to store Q. However, when Q can be written as a sum of tensor products, computing (exp ,left( tQright) ,) becomes feasible by the uniformization method, which does not require explicit storage of Q. Here we provide an analogous algorithm for computing (partial exp ,left( tQright) ,/partial theta ), the differentiated uniformization method. We demonstrate our algorithm for the stochastic SIR model of epidemic spread, for which we show that Q can be written as a sum of tensor products. We estimate monthly infection and recovery rates during the first wave of the COVID-19 pandemic in Austria and quantify their uncertainty in a full Bayesian analysis. Implementation and data are available at https://github.com/spang-lab/TenSIR.
{"title":"Differentiated uniformization: a new method for inferring Markov chains on combinatorial state spaces including stochastic epidemic models","authors":"Kevin Rupp, Rudolf Schill, Jonas Süskind, Peter Georg, Maren Klever, Andreas Lösch, Lars Grasedyck, Tilo Wettig, Rainer Spang","doi":"10.1007/s00180-024-01454-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01454-9","url":null,"abstract":"<p>We consider continuous-time Markov chains that describe the stochastic evolution of a dynamical system by a transition-rate matrix <i>Q</i> which depends on a parameter <span>(theta )</span>. Computing the probability distribution over states at time <i>t</i> requires the matrix exponential <span>(exp ,left( tQright) ,)</span>, and inferring <span>(theta )</span> from data requires its derivative <span>(partial exp ,left( tQright) ,/partial theta )</span>. Both are challenging to compute when the state space and hence the size of <i>Q</i> is huge. This can happen when the state space consists of all combinations of the values of several interacting discrete variables. Often it is even impossible to store <i>Q</i>. However, when <i>Q</i> can be written as a sum of tensor products, computing <span>(exp ,left( tQright) ,)</span> becomes feasible by the uniformization method, which does not require explicit storage of <i>Q</i>. Here we provide an analogous algorithm for computing <span>(partial exp ,left( tQright) ,/partial theta )</span>, the <i>differentiated uniformization method</i>. We demonstrate our algorithm for the stochastic SIR model of epidemic spread, for which we show that <i>Q</i> can be written as a sum of tensor products. We estimate monthly infection and recovery rates during the first wave of the COVID-19 pandemic in Austria and quantify their uncertainty in a full Bayesian analysis. Implementation and data are available at https://github.com/spang-lab/TenSIR.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"74 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139578734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}