首页 > 最新文献

Computational Statistics最新文献

英文 中文
Finite mixture of regression models for censored data based on the skew-t distribution 基于 skew-t 分布的删减数据有限混合回归模型
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-10 DOI: 10.1007/s00180-024-01459-4
Jiwon Park, Dipak K. Dey, Víctor H. Lachos

Finite mixture models have been widely used to model and analyze data from heterogeneous populations. In practical scenarios, these types of data often confront upper and/or lower detection limits due to the constraints imposed by experimental apparatuses. Additional complexity arises when measures of each mixture component significantly deviate from the normal distribution, manifesting characteristics such as multimodality, asymmetry, and heavy-tailed behavior, simultaneously. This paper introduces a flexible model tailored for censored data to address these intricacies, leveraging the finite mixture of skew-t distributions. An Expectation Conditional Maximization Either (ECME) algorithm, is developed to efficiently derive parameter estimates by iteratively maximizing the observed data log-likelihood function. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of truncated skew-t distributions. Moreover, a method based on general information principles is presented for approximating the asymptotic covariance matrix of the estimators. Results obtained from the analysis of both simulated and real datasets demonstrate the proposed method’s effectiveness.

有限混合物模型已被广泛用于异质群体数据的建模和分析。在实际应用中,由于实验设备的限制,这些类型的数据往往面临检测上限和/或下限的问题。当每个混合物成分的测量值明显偏离正态分布,同时表现出多模态、不对称和重尾行为等特征时,就会产生额外的复杂性。本文利用倾斜-t 分布的有限混合物,介绍了一种为删减数据定制的灵活模型,以解决这些错综复杂的问题。本文开发了一种期望条件最大化算法(ECME),通过迭代最大化观测数据的对数似然函数,有效地得出参数估计。该算法在 E 步有闭式表达式,依赖于截断偏斜-t 分布的均值和方差公式。此外,还提出了一种基于一般信息原理的方法,用于逼近估计值的渐近协方差矩阵。对模拟数据集和真实数据集的分析结果证明了所提方法的有效性。
{"title":"Finite mixture of regression models for censored data based on the skew-t distribution","authors":"Jiwon Park, Dipak K. Dey, Víctor H. Lachos","doi":"10.1007/s00180-024-01459-4","DOIUrl":"https://doi.org/10.1007/s00180-024-01459-4","url":null,"abstract":"<p>Finite mixture models have been widely used to model and analyze data from heterogeneous populations. In practical scenarios, these types of data often confront upper and/or lower detection limits due to the constraints imposed by experimental apparatuses. Additional complexity arises when measures of each mixture component significantly deviate from the normal distribution, manifesting characteristics such as multimodality, asymmetry, and heavy-tailed behavior, simultaneously. This paper introduces a flexible model tailored for censored data to address these intricacies, leveraging the finite mixture of skew-<i>t</i> distributions. An Expectation Conditional Maximization Either (ECME) algorithm, is developed to efficiently derive parameter estimates by iteratively maximizing the observed data log-likelihood function. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of truncated skew-<i>t</i> distributions. Moreover, a method based on general information principles is presented for approximating the asymptotic covariance matrix of the estimators. Results obtained from the analysis of both simulated and real datasets demonstrate the proposed method’s effectiveness.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"38 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A simulation model to analyze the behavior of a faculty retirement plan: a case study in Mexico 分析教师退休计划行为的模拟模型:墨西哥案例研究
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-09 DOI: 10.1007/s00180-024-01456-7
Marco Antonio Montufar-Benítez, Jaime Mora-Vargas, Carlos Arturo Soto-Campos, Gilberto Pérez-Lechuga, José Raúl Castro-Esparza

The main goal in this study was to determine confidence intervals for average age, average seniority, and average money-savings, for faculty members in a university retirement system using a simulation model. The simulation—built-in Arena—considers age, seniority, and the probability of continuing in the institution as the main input random variables in the model. An annual interest rate of 7% and an average annual salary increase of 3% were considered. The scenario simulated consisted of the teacher and the university making contributions, the faculty 5% of his salary, and the university 5% of the teacher’s salary. Since the base salaries with which teachers join to university are variable, we considered a monthly salary of MXN 23 181.2, corresponding to full-time teachers with middle salaries. The results obtained by a simulation of 30 replicates showed that the confidence intervals for the average age at retirement were (55.0, 55.2) years, for the average seniority (22.1, 22.3) years, and for the average savings amount (329 795.2, 341 287.0) MXN. Moreover, the risk that a retiree of 62 years of age and more of 25 years of work, is alive after his savings runs out is approximately 98% and this happens at 64 years of age.

本研究的主要目标是利用一个模拟模型,确定一所大学退休制度中教师的平均年龄、平均年资和平均资金储蓄的置信区间。该模拟内置竞技场,将年龄、工龄和继续在该机构工作的概率作为模型的主要输入随机变量。年利率为 7%,年平均工资增长率为 3%。模拟的方案包括教师和大学缴费,教师缴费额为其工资的 5%,大学缴费额为教师工资的 5%。由于教师进入大学时的基本工资是不固定的,我们考虑了月薪为 23 181.2 马新 西兰元的情况,相当于中等工资的全职教师。30 次重复模拟的结果显示,平均退休年龄的置信区间为(55.0,55.2)岁,平均工龄的置信区间为(22.1,22.3)年,平均储蓄额的置信区间为(329 795.2,341 287.0)马币。此外,年满 62 岁且工作年限超过 25 年的退休人员在其储蓄用完后仍然活着的风险约为 98%,这种情况发生在 64 岁。
{"title":"A simulation model to analyze the behavior of a faculty retirement plan: a case study in Mexico","authors":"Marco Antonio Montufar-Benítez, Jaime Mora-Vargas, Carlos Arturo Soto-Campos, Gilberto Pérez-Lechuga, José Raúl Castro-Esparza","doi":"10.1007/s00180-024-01456-7","DOIUrl":"https://doi.org/10.1007/s00180-024-01456-7","url":null,"abstract":"<p>The main goal in this study was to determine confidence intervals for average age, average seniority, and average money-savings, for faculty members in a university retirement system using a simulation model. The simulation—built-in Arena—considers age, seniority, and the probability of continuing in the institution as the main input random variables in the model. An annual interest rate of 7% and an average annual salary increase of 3% were considered. The scenario simulated consisted of the teacher and the university making contributions, the faculty 5% of his salary, and the university 5% of the teacher’s salary. Since the base salaries with which teachers join to university are variable, we considered a monthly salary of MXN 23 181.2, corresponding to full-time teachers with middle salaries. The results obtained by a simulation of 30 replicates showed that the confidence intervals for the average age at retirement were (55.0, 55.2) years, for the average seniority (22.1, 22.3) years, and for the average savings amount (329 795.2, 341 287.0) MXN. Moreover, the risk that a retiree of 62 years of age and more of 25 years of work, is alive after his savings runs out is approximately 98% and this happens at 64 years of age.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"4 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fitting concentric elliptical shapes under general model 一般模型下的同心椭圆形拟合
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-09 DOI: 10.1007/s00180-024-01460-x

Abstract

Fitting concentric ellipses is a crucial yet challenging task in image processing, pattern recognition, and astronomy. To address this complexity, researchers have introduced simplified models by imposing geometric assumptions. These assumptions enable the linearization of the model through reparameterization, allowing for the extension of various fitting methods. However, these restrictive assumptions often fail to hold in real-world scenarios, limiting their practical applicability. In this work, we propose two novel estimators that relax these assumptions: the Least Squares method (LS) and the Gradient Algebraic Fit (GRAF). Since these methods are iterative, we provide numerical implementations and strategies for obtaining reliable initial guesses. Moreover, we employ perturbation theory to conduct a first-order analysis, deriving the leading terms of their Mean Squared Errors and their theoretical lower bounds. Our theoretical findings reveal that the GRAF is statistically efficient, while the LS method is not. We further validate our theoretical results and the performance of the proposed estimators through a series of numerical experiments on both real and synthetic data.

摘要 拟合同心椭圆是图像处理、模式识别和天文学中一项重要而又具有挑战性的任务。为了解决这一复杂问题,研究人员通过施加几何假设引入了简化模型。这些假设通过重新参数化使模型线性化,从而扩展了各种拟合方法。然而,这些限制性假设在现实世界中往往不成立,限制了它们的实际应用性。在这项工作中,我们提出了两种放宽这些假设的新型估计方法:最小二乘法(LS)和梯度代数拟合法(GRAF)。由于这些方法都是迭代法,我们提供了数值实现方法和策略,以获得可靠的初始猜测。此外,我们还利用扰动理论进行了一阶分析,得出了它们的均方误差前导项及其理论下限。我们的理论研究结果表明,GRAF 在统计上是高效的,而 LS 方法则不然。我们通过对真实数据和合成数据进行一系列数值实验,进一步验证了我们的理论结果和所提估计方法的性能。
{"title":"Fitting concentric elliptical shapes under general model","authors":"","doi":"10.1007/s00180-024-01460-x","DOIUrl":"https://doi.org/10.1007/s00180-024-01460-x","url":null,"abstract":"<h3>Abstract</h3> <p>Fitting concentric ellipses is a crucial yet challenging task in image processing, pattern recognition, and astronomy. To address this complexity, researchers have introduced simplified models by imposing geometric assumptions. These assumptions enable the linearization of the model through reparameterization, allowing for the extension of various fitting methods. However, these restrictive assumptions often fail to hold in real-world scenarios, limiting their practical applicability. In this work, we propose two novel estimators that relax these assumptions: the Least Squares method (LS) and the Gradient Algebraic Fit (GRAF). Since these methods are iterative, we provide numerical implementations and strategies for obtaining reliable initial guesses. Moreover, we employ perturbation theory to conduct a first-order analysis, deriving the leading terms of their Mean Squared Errors and their theoretical lower bounds. Our theoretical findings reveal that the GRAF is statistically efficient, while the LS method is not. We further validate our theoretical results and the performance of the proposed estimators through a series of numerical experiments on both real and synthetic data.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"40 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring local explanations of nonlinear models using animated linear projections 利用动画线性投影探索非线性模型的局部解释
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-31 DOI: 10.1007/s00180-023-01453-2
Nicholas Spyrison, Dianne Cook, Przemyslaw Biecek

The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to arrive at a prediction. These provide a point estimate of the linear variable importance in the vicinity of a single observation. However, LVAs tend not to effectively handle association between predictors. To understand how the interaction between predictors affects the variable importance estimate, we can convert LVAs into linear projections and use the radial tour. This is also useful for learning how a model has made a mistake, or the effect of outliers, or the clustering of observations. The approach is illustrated with examples from categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) response models. The methods are implemented in the R package cheem, available on CRAN.

机器学习模型预测能力的提高是以复杂性的增加和可解释性的丧失为代价的,尤其是与参数统计模型相比。这种权衡导致了可解释人工智能(XAI)的出现,它提供了一些方法,如局部解释(LE)和局部变量归因(LVA),以揭示模型是如何利用预测因子得出预测结果的。这些方法提供了对单个观测值附近线性变量重要性的点估计。然而,线性变量归因往往不能有效地处理预测因子之间的关联。为了了解预测因子之间的交互作用如何影响变量重要性估计值,我们可以将 LVA 转换为线性投影并使用径向游程。这对于了解模型如何出错、异常值的影响或观察结果的聚类也很有用。我们以分类(企鹅种类、巧克力类型)和定量(足球/橄榄球工资、房价)响应模型为例,对该方法进行了说明。这些方法在 CRAN 上提供的 R 软件包 cheem 中实现。
{"title":"Exploring local explanations of nonlinear models using animated linear projections","authors":"Nicholas Spyrison, Dianne Cook, Przemyslaw Biecek","doi":"10.1007/s00180-023-01453-2","DOIUrl":"https://doi.org/10.1007/s00180-023-01453-2","url":null,"abstract":"<p>The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to arrive at a prediction. These provide a point estimate of the linear variable importance in the vicinity of a single observation. However, LVAs tend not to effectively handle association between predictors. To understand how the interaction between predictors affects the variable importance estimate, we can convert LVAs into linear projections and use the radial tour. This is also useful for learning how a model has made a mistake, or the effect of outliers, or the clustering of observations. The approach is illustrated with examples from categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) response models. The methods are implemented in the R package cheem, available on CRAN.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"12 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139649042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semiparametric regression modelling of current status competing risks data: a Bayesian approach 现状竞争风险数据的半参数回归建模:一种贝叶斯方法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-31 DOI: 10.1007/s00180-024-01455-8
Pavithra Hariharan, P. G. Sankaran

The current status censoring takes place in survival analysis when the exact event times are not known, but each individual is monitored once for their survival status. The current status data often arise in medical research, from situations that involve multiple causes of failure. Examining current status competing risks data, commonly encountered in epidemiological studies and clinical trials, is more advantageous with Bayesian methods compared to conventional approaches. They excel in integrating prior knowledge with the observed data and delivering accurate results even with small samples. Inspired by these advantages, the present study is pioneering in introducing a Bayesian framework for both modelling and analysis of current status competing risks data together with covariates. By means of the proportional hazards model, estimation procedures for the regression parameters and cumulative incidence functions are established assuming appropriate prior distributions. The posterior computation is performed using an adaptive Metropolis–Hastings algorithm. Methods for comparing and validating models have been devised. An assessment of the finite sample characteristics of the estimators is conducted through simulation studies. Through the application of this Bayesian approach to prostate cancer clinical trial data, its practical efficacy is demonstrated.

在生存分析中,如果不知道确切的事件发生时间,但对每个人的生存状态进行一次监测,就会出现当前状态剔除。医学研究中经常会出现当前状态数据,这些数据来自涉及多种失败原因的情况。与传统方法相比,贝叶斯方法在研究流行病学研究和临床试验中常见的当前状况竞争风险数据方面更具优势。贝叶斯方法擅长将先验知识与观测数据相结合,即使样本较少也能得出准确的结果。受这些优势的启发,本研究开创性地引入了贝叶斯框架,用于对现状竞争风险数据以及协变量进行建模和分析。通过比例危险模型,假设适当的先验分布,建立了回归参数和累积发病率函数的估计程序。后验计算采用自适应 Metropolis-Hastings 算法。还设计了比较和验证模型的方法。通过模拟研究对估计器的有限样本特征进行了评估。通过将这种贝叶斯方法应用于前列腺癌临床试验数据,证明了它的实际功效。
{"title":"Semiparametric regression modelling of current status competing risks data: a Bayesian approach","authors":"Pavithra Hariharan, P. G. Sankaran","doi":"10.1007/s00180-024-01455-8","DOIUrl":"https://doi.org/10.1007/s00180-024-01455-8","url":null,"abstract":"<p>The current status censoring takes place in survival analysis when the exact event times are not known, but each individual is monitored once for their survival status. The current status data often arise in medical research, from situations that involve multiple causes of failure. Examining current status competing risks data, commonly encountered in epidemiological studies and clinical trials, is more advantageous with Bayesian methods compared to conventional approaches. They excel in integrating prior knowledge with the observed data and delivering accurate results even with small samples. Inspired by these advantages, the present study is pioneering in introducing a Bayesian framework for both modelling and analysis of current status competing risks data together with covariates. By means of the proportional hazards model, estimation procedures for the regression parameters and cumulative incidence functions are established assuming appropriate prior distributions. The posterior computation is performed using an adaptive Metropolis–Hastings algorithm. Methods for comparing and validating models have been devised. An assessment of the finite sample characteristics of the estimators is conducted through simulation studies. Through the application of this Bayesian approach to prostate cancer clinical trial data, its practical efficacy is demonstrated.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"37 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139649048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentiated uniformization: a new method for inferring Markov chains on combinatorial state spaces including stochastic epidemic models 有区别的统一化:推断组合状态空间(包括随机流行病模型)上马尔可夫链的新方法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-26 DOI: 10.1007/s00180-024-01454-9
Kevin Rupp, Rudolf Schill, Jonas Süskind, Peter Georg, Maren Klever, Andreas Lösch, Lars Grasedyck, Tilo Wettig, Rainer Spang

We consider continuous-time Markov chains that describe the stochastic evolution of a dynamical system by a transition-rate matrix Q which depends on a parameter (theta ). Computing the probability distribution over states at time t requires the matrix exponential (exp ,left( tQright) ,), and inferring (theta ) from data requires its derivative (partial exp ,left( tQright) ,/partial theta ). Both are challenging to compute when the state space and hence the size of Q is huge. This can happen when the state space consists of all combinations of the values of several interacting discrete variables. Often it is even impossible to store Q. However, when Q can be written as a sum of tensor products, computing (exp ,left( tQright) ,) becomes feasible by the uniformization method, which does not require explicit storage of Q. Here we provide an analogous algorithm for computing (partial exp ,left( tQright) ,/partial theta ), the differentiated uniformization method. We demonstrate our algorithm for the stochastic SIR model of epidemic spread, for which we show that Q can be written as a sum of tensor products. We estimate monthly infection and recovery rates during the first wave of the COVID-19 pandemic in Austria and quantify their uncertainty in a full Bayesian analysis. Implementation and data are available at https://github.com/spang-lab/TenSIR.

我们考虑连续时间马尔可夫链,它通过过渡率矩阵 Q 来描述动态系统的随机演化,而过渡率矩阵 Q 取决于参数 (theta )。计算t时刻状态的概率分布需要矩阵指数(exp ,left(tQright)),而从数据中推断(theta )需要其导数(partial exp ,left(tQright))。如果状态空间很大,因此 Q 的大小也很大,那么计算这两者都很困难。当状态空间由几个相互作用的离散变量值的所有组合组成时,就会出现这种情况。然而,当 Q 可以写成张量乘积之和时,通过均匀化方法计算 (exp ,left( tQright) ,)就变得可行了,这种方法不需要显式存储 Q。我们为随机 SIR 流行病传播模型演示了我们的算法,并证明 Q 可以写成张量乘积之和。我们估算了 COVID-19 在奥地利第一波流行期间的月感染率和恢复率,并通过全贝叶斯分析量化了其不确定性。实现方法和数据可在 https://github.com/spang-lab/TenSIR 上获取。
{"title":"Differentiated uniformization: a new method for inferring Markov chains on combinatorial state spaces including stochastic epidemic models","authors":"Kevin Rupp, Rudolf Schill, Jonas Süskind, Peter Georg, Maren Klever, Andreas Lösch, Lars Grasedyck, Tilo Wettig, Rainer Spang","doi":"10.1007/s00180-024-01454-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01454-9","url":null,"abstract":"<p>We consider continuous-time Markov chains that describe the stochastic evolution of a dynamical system by a transition-rate matrix <i>Q</i> which depends on a parameter <span>(theta )</span>. Computing the probability distribution over states at time <i>t</i> requires the matrix exponential <span>(exp ,left( tQright) ,)</span>, and inferring <span>(theta )</span> from data requires its derivative <span>(partial exp ,left( tQright) ,/partial theta )</span>. Both are challenging to compute when the state space and hence the size of <i>Q</i> is huge. This can happen when the state space consists of all combinations of the values of several interacting discrete variables. Often it is even impossible to store <i>Q</i>. However, when <i>Q</i> can be written as a sum of tensor products, computing <span>(exp ,left( tQright) ,)</span> becomes feasible by the uniformization method, which does not require explicit storage of <i>Q</i>. Here we provide an analogous algorithm for computing <span>(partial exp ,left( tQright) ,/partial theta )</span>, the <i>differentiated uniformization method</i>. We demonstrate our algorithm for the stochastic SIR model of epidemic spread, for which we show that <i>Q</i> can be written as a sum of tensor products. We estimate monthly infection and recovery rates during the first wave of the COVID-19 pandemic in Austria and quantify their uncertainty in a full Bayesian analysis. Implementation and data are available at https://github.com/spang-lab/TenSIR.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"74 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139578734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new approach to nonparametric estimation of multivariate spectral density function using basis expansion 利用基扩展对多元谱密度函数进行非参数估计的新方法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-20 DOI: 10.1007/s00180-023-01451-4
Shirin Nezampour, Alireza Nematollahi, Robert T. Krafty, Mehdi Maadooliat

This paper develops a nonparametric method for estimating the spectral density of multivariate stationary time series using basis expansion. A likelihood-based approach is used to fit the model through the minimization of a penalized Whittle negative log-likelihood. Then, a Newton-type algorithm is developed for the computation. In this method, we smooth the Cholesky factors of the multivariate spectral density matrix in a way that the reconstructed estimate based on the smoothed Cholesky components is consistent and positive-definite. In a simulation study, we have illustrated and compared our proposed method with other competitive approaches. Finally, we apply our approach to two real-world problems, Electroencephalogram signals analysis, (El Nitilde{n}o) Cycle.

本文开发了一种非参数方法,利用基扩展估计多元静态时间序列的谱密度。本文采用基于似然法的方法,通过最小化惩罚惠特尔负对数似然来拟合模型。然后,为计算开发了一种牛顿型算法。在这种方法中,我们对多元谱密度矩阵的 Cholesky 因子进行平滑处理,使基于平滑 Cholesky 分量的重建估计值具有一致性和正有限性。在模拟研究中,我们对所提出的方法进行了说明,并与其他竞争方法进行了比较。最后,我们将我们的方法应用于两个现实世界的问题:脑电信号分析、(El Nitilde{n}o )循环。
{"title":"A new approach to nonparametric estimation of multivariate spectral density function using basis expansion","authors":"Shirin Nezampour, Alireza Nematollahi, Robert T. Krafty, Mehdi Maadooliat","doi":"10.1007/s00180-023-01451-4","DOIUrl":"https://doi.org/10.1007/s00180-023-01451-4","url":null,"abstract":"<p>This paper develops a nonparametric method for estimating the spectral density of multivariate stationary time series using basis expansion. A likelihood-based approach is used to fit the model through the minimization of a penalized Whittle negative log-likelihood. Then, a Newton-type algorithm is developed for the computation. In this method, we smooth the Cholesky factors of the multivariate spectral density matrix in a way that the reconstructed estimate based on the smoothed Cholesky components is consistent and positive-definite. In a simulation study, we have illustrated and compared our proposed method with other competitive approaches. Finally, we apply our approach to two real-world problems, Electroencephalogram signals analysis, <span>(El Nitilde{n}o)</span> Cycle.\u0000</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"13 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139508567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Censored broken adaptive ridge regression in high-dimension 高维度矢量破碎自适应脊回归
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-17 DOI: 10.1007/s00180-023-01446-1
Jeongjin Lee, Taehwa Choi, Sangbum Choi

Broken adaptive ridge (BAR) is a penalized regression method that performs variable selection via a computationally scalable surrogate to (L_0) regularization. The BAR regression has many appealing features; it converges to selection with (L_0) penalties as a result of reweighting (L_2) penalties, and satisfies the oracle property with grouping effect for highly correlated covariates. In this paper, we investigate the BAR procedure for variable selection in a semiparametric accelerated failure time model with complex high-dimensional censored data. Coupled with Buckley-James-type responses, BAR-based variable selection procedures can be performed when event times are censored in complex ways, such as right-censored, left-censored, or double-censored. Our approach utilizes a two-stage cyclic coordinate descent algorithm to minimize the objective function by iteratively estimating the pseudo survival response and regression coefficients along the direction of coordinates. Under some weak regularity conditions, we establish both the oracle property and the grouping effect of the proposed BAR estimator. Numerical studies are conducted to investigate the finite-sample performance of the proposed algorithm and an application to real data is provided as a data example.

断裂自适应脊(BAR)是一种惩罚回归方法,它通过可计算扩展的代用 (L_0) 正则化来执行变量选择。BAR 回归有很多吸引人的特点:它收敛于 (L_0) 惩罚的选择,作为 (L_2) 惩罚重新加权的结果,并且在高度相关的协变量上满足具有分组效应的 Oracle 特性。在本文中,我们研究了在具有复杂高维删减数据的半参数加速故障时间模型中进行变量选择的 BAR 程序。与 Buckley-James 型响应相结合,基于 BAR 的变量选择程序可在事件时间以复杂方式(如右删失、左删失或双删失)删失时执行。我们的方法采用两阶段循环坐标下降算法,通过沿坐标方向迭代估计伪生存响应和回归系数,使目标函数最小化。在一些弱正则性条件下,我们建立了所提出的 BAR 估计器的甲骨文属性和分组效应。我们进行了数值研究,以考察所提算法的有限样本性能,并提供了一个应用于真实数据的数据示例。
{"title":"Censored broken adaptive ridge regression in high-dimension","authors":"Jeongjin Lee, Taehwa Choi, Sangbum Choi","doi":"10.1007/s00180-023-01446-1","DOIUrl":"https://doi.org/10.1007/s00180-023-01446-1","url":null,"abstract":"<p>Broken adaptive ridge (BAR) is a penalized regression method that performs variable selection via a computationally scalable surrogate to <span>(L_0)</span> regularization. The BAR regression has many appealing features; it converges to selection with <span>(L_0)</span> penalties as a result of reweighting <span>(L_2)</span> penalties, and satisfies the oracle property with grouping effect for highly correlated covariates. In this paper, we investigate the BAR procedure for variable selection in a semiparametric accelerated failure time model with complex high-dimensional censored data. Coupled with Buckley-James-type responses, BAR-based variable selection procedures can be performed when event times are censored in complex ways, such as right-censored, left-censored, or double-censored. Our approach utilizes a two-stage cyclic coordinate descent algorithm to minimize the objective function by iteratively estimating the pseudo survival response and regression coefficients along the direction of coordinates. Under some weak regularity conditions, we establish both the oracle property and the grouping effect of the proposed BAR estimator. Numerical studies are conducted to investigate the finite-sample performance of the proposed algorithm and an application to real data is provided as a data example.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"262 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139482136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional penalized Bernstein support vector classifier 高维惩罚伯恩斯坦支持向量分类器
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-16 DOI: 10.1007/s00180-023-01448-z
Rachid Kharoubi, Abdallah Mkhadri, Karim Oualkacha

The support vector machine (SVM) is a powerful classifier used for binary classification to improve the prediction accuracy. However, the nondifferentiability of the SVM hinge loss function can lead to computational difficulties in high-dimensional settings. To overcome this problem, we rely on the Bernstein polynomial and propose a new smoothed version of the SVM hinge loss called the Bernstein support vector machine (BernSVC). This extension is suitable for the high dimension regime. As the BernSVC objective loss function is twice differentiable everywhere, we propose two efficient algorithms for computing the solution of the penalized BernSVC. The first algorithm is based on coordinate descent with the maximization-majorization principle and the second algorithm is the iterative reweighted least squares-type algorithm. Under standard assumptions, we derive a cone condition and a restricted strong convexity to establish an upper bound for the weighted lasso BernSVC estimator. By using a local linear approximation, we extend the latter result to the penalized BernSVC with nonconvex penalties SCAD and MCP. Our bound holds with high probability and achieves the so-called fast rate under mild conditions on the design matrix. Simulation studies are considered to illustrate the prediction accuracy of BernSVC relative to its competitors and also to compare the performance of the two algorithms in terms of computational timing and error estimation. The use of the proposed method is illustrated through analysis of three large-scale real data examples.

支持向量机(SVM)是一种功能强大的分类器,用于二元分类以提高预测精度。然而,SVM 铰链损失函数的不可分性会导致高维环境下的计算困难。为了克服这个问题,我们依靠伯恩斯坦多项式,提出了一种新的平滑 SVM 铰链损失版本,称为伯恩斯坦支持向量机(BernSVC)。这种扩展适用于高维度系统。由于 BernSVC 目标损失函数在任何地方都是二次微分的,因此我们提出了两种计算受惩罚 BernSVC 解的高效算法。第一种算法是基于最大化-主要化原则的坐标下降算法,第二种算法是迭代重权最小二乘法。在标准假设条件下,我们推导出一个圆锥条件和一个受限强凸性,从而建立了加权套索 BernSVC 估计器的上界。通过使用局部线性近似,我们将后一结果扩展到具有非凸惩罚 SCAD 和 MCP 的惩罚 BernSVC。我们的约束大概率成立,并在设计矩阵的温和条件下实现了所谓的快速率。仿真研究说明了 BernSVC 相对于其竞争对手的预测精度,并比较了两种算法在计算时间和误差估计方面的性能。通过对三个大规模真实数据实例的分析,说明了所提方法的用途。
{"title":"High-dimensional penalized Bernstein support vector classifier","authors":"Rachid Kharoubi, Abdallah Mkhadri, Karim Oualkacha","doi":"10.1007/s00180-023-01448-z","DOIUrl":"https://doi.org/10.1007/s00180-023-01448-z","url":null,"abstract":"<p>The support vector machine (SVM) is a powerful classifier used for binary classification to improve the prediction accuracy. However, the nondifferentiability of the SVM hinge loss function can lead to computational difficulties in high-dimensional settings. To overcome this problem, we rely on the Bernstein polynomial and propose a new smoothed version of the SVM hinge loss called the Bernstein support vector machine (BernSVC). This extension is suitable for the high dimension regime. As the BernSVC objective loss function is twice differentiable everywhere, we propose two efficient algorithms for computing the solution of the penalized BernSVC. The first algorithm is based on coordinate descent with the maximization-majorization principle and the second algorithm is the iterative reweighted least squares-type algorithm. Under standard assumptions, we derive a cone condition and a restricted strong convexity to establish an upper bound for the weighted lasso BernSVC estimator. By using a local linear approximation, we extend the latter result to the penalized BernSVC with nonconvex penalties SCAD and MCP. Our bound holds with high probability and achieves the so-called fast rate under mild conditions on the design matrix. Simulation studies are considered to illustrate the prediction accuracy of BernSVC relative to its competitors and also to compare the performance of the two algorithms in terms of computational timing and error estimation. The use of the proposed method is illustrated through analysis of three large-scale real data examples.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"262 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139482088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random forest based quantile-oriented sensitivity analysis indices estimation 基于随机森林的面向量值的敏感性分析指数估算
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-12 DOI: 10.1007/s00180-023-01450-5
Kévin Elie-Dit-Cosaque, Véronique Maume-Deschamps

We propose a random forest based estimation procedure for Quantile-Oriented Sensitivity Analysis—QOSA. In order to be efficient, a cross-validation step on the leaf size of trees is required. Our full estimation procedure is tested on both simulated data and a real dataset. Our estimators use either the bootstrap samples or the original sample in the estimation. Also, they are either based on a quantile plug-in procedure (the R-estimators) or on a direct minimization (the Q-estimators). This leads to 8 different estimators which are compared on simulations. From these simulations, it seems that the estimation method based on a direct minimization is better than the one plugging the quantile. This is a significant result because the method with direct minimization requires only one sample and could therefore be preferred.

我们为面向量子敏感性分析(Quantile-Oriented Sensitivity Analysis-QOSA)提出了一种基于随机森林的估算程序。为了提高效率,需要对树的叶片大小进行交叉验证。我们的完整估计程序在模拟数据和真实数据集上进行了测试。我们的估算器在估算中使用自举样本或原始样本。此外,它们要么基于量子插入程序(R-估计器),要么基于直接最小化(Q-估计器)。由此产生了 8 种不同的估计方法,并通过模拟进行了比较。从模拟结果来看,基于直接最小化的估计方法要优于插入量值的估计方法。这是一个重要的结果,因为直接最小化方法只需要一个样本,因此可以优先采用。
{"title":"Random forest based quantile-oriented sensitivity analysis indices estimation","authors":"Kévin Elie-Dit-Cosaque, Véronique Maume-Deschamps","doi":"10.1007/s00180-023-01450-5","DOIUrl":"https://doi.org/10.1007/s00180-023-01450-5","url":null,"abstract":"<p>We propose a random forest based estimation procedure for Quantile-Oriented Sensitivity Analysis—QOSA. In order to be efficient, a cross-validation step on the leaf size of trees is required. Our full estimation procedure is tested on both simulated data and a real dataset. Our estimators use either the bootstrap samples or the original sample in the estimation. Also, they are either based on a quantile plug-in procedure (the <i>R</i>-estimators) or on a direct minimization (the <i>Q</i>-estimators). This leads to 8 different estimators which are compared on simulations. From these simulations, it seems that the estimation method based on a direct minimization is better than the one plugging the quantile. This is a significant result because the method with direct minimization requires only one sample and could therefore be preferred.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"54 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139462061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1