首页 > 最新文献

Statistics and Computing最新文献

英文 中文
Systemic infinitesimal over-dispersion on graphical dynamic models 图形动态模型上的系统无穷小过度分散
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-02 DOI: 10.1007/s11222-024-10443-3
Ning Ning, Edward Ionides

Stochastic models for collections of interacting populations have crucial roles in many scientific fields such as epidemiology, ecology, performance engineering, and queueing theory, to name a few. However, the standard approach to extending an ordinary differential equation model to a Markov chain does not have sufficient flexibility in the mean-variance relationship to match data. To handle that, we develop new approaches using Dirichlet noise to construct collections of independent or dependent noise processes. This permits the modeling of high-frequency variation in transition rates both within and between the populations under study. Our theory is developed in a general framework of time-inhomogeneous Markov processes equipped with a general graphical structure. We demonstrate our approach on a widely analyzed measles dataset, adding Dirichlet noise to a classical Susceptible–Exposed–Infected–Recovered model. Our methodology shows improved statistical fit measured by log-likelihood and provides new insights into the dynamics of this biological system.

在许多科学领域,如流行病学、生态学、性能工程和排队理论等,相互作用种群集合的随机模型发挥着至关重要的作用。然而,将常微分方程模型扩展为马尔可夫链的标准方法在均值-方差关系上没有足够的灵活性来匹配数据。为此,我们开发了使用 Dirichlet 噪声的新方法,以构建独立或从属噪声过程的集合。这样就可以对所研究种群内部和种群之间过渡率的高频变化进行建模。我们的理论是在具有一般图形结构的时间同构马尔可夫过程的一般框架中发展起来的。我们在一个广泛分析的麻疹数据集上演示了我们的方法,在经典的 "易感-暴露-感染-恢复 "模型中加入了 Dirichlet 噪声。我们的方法改进了以对数似然测量的统计拟合度,并为这一生物系统的动力学提供了新的见解。
{"title":"Systemic infinitesimal over-dispersion on graphical dynamic models","authors":"Ning Ning, Edward Ionides","doi":"10.1007/s11222-024-10443-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10443-3","url":null,"abstract":"<p>Stochastic models for collections of interacting populations have crucial roles in many scientific fields such as epidemiology, ecology, performance engineering, and queueing theory, to name a few. However, the standard approach to extending an ordinary differential equation model to a Markov chain does not have sufficient flexibility in the mean-variance relationship to match data. To handle that, we develop new approaches using Dirichlet noise to construct collections of independent or dependent noise processes. This permits the modeling of high-frequency variation in transition rates both within and between the populations under study. Our theory is developed in a general framework of time-inhomogeneous Markov processes equipped with a general graphical structure. We demonstrate our approach on a widely analyzed measles dataset, adding Dirichlet noise to a classical Susceptible–Exposed–Infected–Recovered model. Our methodology shows improved statistical fit measured by log-likelihood and provides new insights into the dynamics of this biological system.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"14 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Greedy recursive spectral bisection for modularity-bound hierarchical divisive community detection 用于模块化约束分层分裂群落检测的贪婪递归光谱分段法
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-27 DOI: 10.1007/s11222-024-10451-3
Douglas O. Cardoso, João Domingos Gomes da Silva Junior, Carla Silva Oliveira, Celso Marques, Laura Silva de Assis

Spectral clustering techniques depend on the eigenstructure of a similarity matrix to assign data points to clusters, so that points within the same cluster exhibit high similarity and are compared to those in different clusters. This work aimed to develop a spectral method that could be compared to clustering algorithms that represent the current state of the art. This investigation conceived a novel spectral clustering method, as well as five policies that guide its execution, based on spectral graph theory and embodying hierarchical clustering principles. Computational experiments comparing the proposed method with six state-of-the-art algorithms were undertaken in this study to evaluate the clustering methods under scrutiny. The assessment was performed using two evaluation metrics, specifically the adjusted Rand index, and modularity. The obtained results furnish compelling evidence, indicating that the proposed method is competitive and possesses distinctive properties compared to those elucidated in the existing literature. This suggests that our approach stands as a viable alternative, offering a robust choice within the spectrum of available same-purpose tools.

光谱聚类技术依赖于相似性矩阵的特征结构来将数据点分配到聚类中,从而使同一聚类中的点表现出较高的相似性,并与不同聚类中的点进行比较。这项工作旨在开发一种可与代表当前技术水平的聚类算法进行比较的光谱方法。这项研究以谱图理论为基础,体现了分层聚类原理,构思了一种新颖的谱聚类方法,以及指导其执行的五种策略。本研究将所提出的方法与六种最先进的算法进行了计算实验比较,以评估所研究的聚类方法。评估使用了两个评价指标,特别是调整后的兰德指数和模块性。所获得的结果提供了令人信服的证据,表明与现有文献中阐明的方法相比,所提出的方法具有竞争力和独特性。这表明,我们的方法是一种可行的替代方法,在现有的同用途工具中提供了一种稳健的选择。
{"title":"Greedy recursive spectral bisection for modularity-bound hierarchical divisive community detection","authors":"Douglas O. Cardoso, João Domingos Gomes da Silva Junior, Carla Silva Oliveira, Celso Marques, Laura Silva de Assis","doi":"10.1007/s11222-024-10451-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10451-3","url":null,"abstract":"<p>Spectral clustering techniques depend on the eigenstructure of a similarity matrix to assign data points to clusters, so that points within the same cluster exhibit high similarity and are compared to those in different clusters. This work aimed to develop a spectral method that could be compared to clustering algorithms that represent the current state of the art. This investigation conceived a novel spectral clustering method, as well as five policies that guide its execution, based on spectral graph theory and embodying hierarchical clustering principles. Computational experiments comparing the proposed method with six state-of-the-art algorithms were undertaken in this study to evaluate the clustering methods under scrutiny. The assessment was performed using two evaluation metrics, specifically the adjusted Rand index, and modularity. The obtained results furnish compelling evidence, indicating that the proposed method is competitive and possesses distinctive properties compared to those elucidated in the existing literature. This suggests that our approach stands as a viable alternative, offering a robust choice within the spectrum of available same-purpose tools.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"11 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible Bayesian quantile regression based on the generalized asymmetric Huberised-type distribution 基于广义非对称胡贝利兹型分布的灵活贝叶斯量化回归
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-27 DOI: 10.1007/s11222-024-10453-1
Weitao Hu, Weiping Zhang

To enhance the robustness and flexibility of Bayesian quantile regression models using the asymmetric Laplace or asymmetric Huberised-type (AH) distribution, which lacks changeable mode, diminishing influence of outliers, and asymmetry under median regression, we propose a new generalized AH distribution which is achieved through a hierarchical mixture representation, thus leading to a flexible Bayesian Huberised quantile regression framework. With many parameters in the model, we develop an efficient Markov chain Monte Carlo procedure based on the Metropolis-within-Gibbs sampling algorithm. The robustness and flexibility of the new distribution are examined through intensive simulation studies and application to two real data sets.

非对称拉普拉斯或非对称休伯里化型(AH)分布缺乏可变模式、离群值影响减弱和中位回归下的非对称性,为了增强使用该分布的贝叶斯量化回归模型的稳健性和灵活性,我们提出了一种新的广义 AH 分布,该分布通过分层混合表示来实现,从而产生了一种灵活的贝叶斯休伯里化量化回归框架。由于模型中有许多参数,我们基于 Metropolis-within-Gibbs 采样算法开发了一种高效的马尔可夫链蒙特卡罗程序。通过深入的模拟研究和对两个真实数据集的应用,检验了新分布的稳健性和灵活性。
{"title":"Flexible Bayesian quantile regression based on the generalized asymmetric Huberised-type distribution","authors":"Weitao Hu, Weiping Zhang","doi":"10.1007/s11222-024-10453-1","DOIUrl":"https://doi.org/10.1007/s11222-024-10453-1","url":null,"abstract":"<p>To enhance the robustness and flexibility of Bayesian quantile regression models using the asymmetric Laplace or asymmetric Huberised-type (AH) distribution, which lacks changeable mode, diminishing influence of outliers, and asymmetry under median regression, we propose a new generalized AH distribution which is achieved through a hierarchical mixture representation, thus leading to a flexible Bayesian Huberised quantile regression framework. With many parameters in the model, we develop an efficient Markov chain Monte Carlo procedure based on the Metropolis-within-Gibbs sampling algorithm. The robustness and flexibility of the new distribution are examined through intensive simulation studies and application to two real data sets.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"359 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structured prior distributions for the covariance matrix in latent factor models 潜因模型中协方差矩阵的结构化先验分布
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-26 DOI: 10.1007/s11222-024-10454-0
Sarah Elizabeth Heaps, Ian Hyla Jermyn

Factor models are widely used for dimension reduction in the analysis of multivariate data. This is achieved through decomposition of a (p times p) covariance matrix into the sum of two components. Through a latent factor representation, they can be interpreted as a diagonal matrix of idiosyncratic variances and a shared variation matrix, that is, the product of a (p times k) factor loadings matrix and its transpose. If (k ll p), this defines a parsimonious factorisation of the covariance matrix. Historically, little attention has been paid to incorporating prior information in Bayesian analyses using factor models where, at best, the prior for the factor loadings is order invariant. In this work, a class of structured priors is developed that can encode ideas of dependence structure about the shared variation matrix. The construction allows data-informed shrinkage towards sensible parametric structures while also facilitating inference over the number of factors. Using an unconstrained reparameterisation of stationary vector autoregressions, the methodology is extended to stationary dynamic factor models. For computational inference, parameter-expanded Markov chain Monte Carlo samplers are proposed, including an efficient adaptive Gibbs sampler. Two substantive applications showcase the scope of the methodology and its inferential benefits.

因子模型在多变量数据分析中被广泛用于降维。这是通过将协方差矩阵分解为两个分量之和来实现的。通过潜在因子表示法,它们可以被解释为一个特异性方差的对角矩阵和一个共享变异矩阵,即一个(p 乘以 k)因子载荷矩阵与其转置的乘积。如果 (k ll p), 这就定义了协方差矩阵的合理因子化。在使用因子模型进行贝叶斯分析时,因子载荷的先验信息充其量是阶不变的。在这项工作中,我们开发了一类结构化先验,它可以编码共享变异矩阵的依赖结构思想。这种结构允许根据数据对合理的参数结构进行缩减,同时也便于对因子数量进行推断。利用对静态向量自回归的无约束重参数化,该方法被扩展到静态动态因子模型。在计算推断方面,提出了参数扩展的马尔科夫链蒙特卡罗采样器,包括高效的自适应吉布斯采样器。两个实质性应用展示了该方法的范围及其推理优势。
{"title":"Structured prior distributions for the covariance matrix in latent factor models","authors":"Sarah Elizabeth Heaps, Ian Hyla Jermyn","doi":"10.1007/s11222-024-10454-0","DOIUrl":"https://doi.org/10.1007/s11222-024-10454-0","url":null,"abstract":"<p>Factor models are widely used for dimension reduction in the analysis of multivariate data. This is achieved through decomposition of a <span>(p times p)</span> covariance matrix into the sum of two components. Through a latent factor representation, they can be interpreted as a diagonal matrix of idiosyncratic variances and a shared variation matrix, that is, the product of a <span>(p times k)</span> factor loadings matrix and its transpose. If <span>(k ll p)</span>, this defines a parsimonious factorisation of the covariance matrix. Historically, little attention has been paid to incorporating prior information in Bayesian analyses using factor models where, at best, the prior for the factor loadings is order invariant. In this work, a class of structured priors is developed that can encode ideas of dependence structure about the shared variation matrix. The construction allows data-informed shrinkage towards sensible parametric structures while also facilitating inference over the number of factors. Using an unconstrained reparameterisation of stationary vector autoregressions, the methodology is extended to stationary dynamic factor models. For computational inference, parameter-expanded Markov chain Monte Carlo samplers are proposed, including an efficient adaptive Gibbs sampler. Two substantive applications showcase the scope of the methodology and its inferential benefits.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"548 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Birnbaum–Saunders frailty regression models for clustered survival data 用于聚类生存数据的 Birnbaum-Saunders 虚弱回归模型
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-25 DOI: 10.1007/s11222-024-10458-w
Diego I. Gallardo, Marcelo Bourguignon, José S. Romeo

We present a novel frailty model for modeling clustered survival data. In particular, we consider the Birnbaum–Saunders (BS) distribution for the frailty terms with a new directly parameterized on the variance of the frailty distribution. This allows, among other things, compare the estimated frailty terms among traditional models, such as the gamma frailty model. Some mathematical properties of the new model are studied including the conditional distribution of frailties among the survivors, the frailty of individuals dying at time t, and the Kendall’s (tau ) measure. Furthermore, an explicit form to the derivatives of the Laplace transform for the BS distribution using the di Bruno’s formula is found. Parametric, non-parametric and semiparametric versions of the BS frailty model are studied. We use a simple Expectation-Maximization (EM) algorithm to estimate the model parameters and evaluate its performance under different censoring proportion by a Monte Carlo simulation study. We also show that the BS frailty model is competitive over the gamma and weighted Lindley frailty models under misspecification. We illustrate our methodology by using a real data sets.

我们提出了一种新的虚弱模型,用于对聚类生存数据建模。特别是,我们考虑了 Birnbaum-Saunders(BS)分布的虚弱项,并直接以虚弱分布的方差作为新的参数。这样,除其他外,我们就能将估计的虚弱项与传统模型(如伽马虚弱模型)进行比较。研究了新模型的一些数学特性,包括幸存者中虚弱度的条件分布、在时间 t 死亡的个体的虚弱度以及 Kendall's (tau )度量。此外,还利用 di Bruno 公式找到了 BS 分布拉普拉斯变换导数的明确形式。我们研究了 BS 虚弱模型的参数、非参数和半参数版本。我们使用简单的期望最大化(EM)算法来估计模型参数,并通过蒙特卡罗模拟研究来评估其在不同删减比例下的性能。我们还证明,BS脆性模型相对于伽马脆性模型和加权林德利脆性模型而言,在错误设置的情况下是有竞争力的。我们使用真实数据集来说明我们的方法。
{"title":"Birnbaum–Saunders frailty regression models for clustered survival data","authors":"Diego I. Gallardo, Marcelo Bourguignon, José S. Romeo","doi":"10.1007/s11222-024-10458-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10458-w","url":null,"abstract":"<p>We present a novel frailty model for modeling clustered survival data. In particular, we consider the Birnbaum–Saunders (BS) distribution for the frailty terms with a new directly parameterized on the variance of the frailty distribution. This allows, among other things, compare the estimated frailty terms among traditional models, such as the gamma frailty model. Some mathematical properties of the new model are studied including the conditional distribution of frailties among the survivors, the frailty of individuals dying at time <i>t</i>, and the Kendall’s <span>(tau )</span> measure. Furthermore, an explicit form to the derivatives of the Laplace transform for the BS distribution using the di Bruno’s formula is found. Parametric, non-parametric and semiparametric versions of the BS frailty model are studied. We use a simple Expectation-Maximization (EM) algorithm to estimate the model parameters and evaluate its performance under different censoring proportion by a Monte Carlo simulation study. We also show that the BS frailty model is competitive over the gamma and weighted Lindley frailty models under misspecification. We illustrate our methodology by using a real data sets.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"148 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review on the Adaptive-Ridge Algorithm with several extensions 自适应脊算法及若干扩展功能综述
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-25 DOI: 10.1007/s11222-024-10440-6
Rémy Abergel, Olivier Bouaziz, Grégory Nuel

The Adaptive Ridge Algorithm is an iterative algorithm designed for variable selection. It is also known under the denomination of Iteratively Reweighted Least-Squares Algorithm in the communities of Compressed Sensing and Sparse Signals Recovery. Besides, it can also be interpreted as an optimization algorithm dedicated to the minimization of possibly nonconvex (ell ^q) penalized energies (with (0<q<2)). In the literature, this algorithm can be derived using various mathematical approaches, namely Half Quadratic Minimization, Majorization-Minimization, Alternating Minimization or Local Approximations. In this work, we will show how the Adaptive Ridge Algorithm can be simply derived and analyzed from a single equation, corresponding to a variational reformulation of the (ell ^q) penalty. We will describe in detail how the Adaptive Ridge Algorithm can be numerically implemented and we will perform a thorough experimental study of its parameters. We will also show how the variational formulation of the (ell ^q) penalty combined with modern duality principles can be used to design an interesting variant of the Adaptive Ridge Algorithm dedicated to the minimization of quadratic functions over (nonconvex) (ell ^q) balls.

自适应岭算法是一种用于变量选择的迭代算法。在压缩传感和稀疏信号恢复领域,它也被称为 "迭代加权最小二乘算法"。此外,它还可以被解释为一种优化算法,专门用于最小化可能是非凸的(ell ^q)受惩罚能量((0<q<2))。在文献中,这种算法可以通过多种数学方法得出,即半二次最小化、大数最小化、交替最小化或局部逼近。在这项工作中,我们将展示自适应岭算法是如何从一个等式中简单推导和分析出来的,这个等式对应于 (ell ^q) 惩罚的变式重述。我们将详细介绍自适应山脊算法的数值实现方法,并对其参数进行深入的实验研究。我们还将展示如何将 (ell ^q)惩罚的变分公式与现代对偶原理相结合,设计出一种有趣的自适应山脊算法变体,专门用于最小化(非凸)(ell ^q)球上的二次函数。
{"title":"A review on the Adaptive-Ridge Algorithm with several extensions","authors":"Rémy Abergel, Olivier Bouaziz, Grégory Nuel","doi":"10.1007/s11222-024-10440-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10440-6","url":null,"abstract":"<p>The Adaptive Ridge Algorithm is an iterative algorithm designed for variable selection. It is also known under the denomination of Iteratively Reweighted Least-Squares Algorithm in the communities of Compressed Sensing and Sparse Signals Recovery. Besides, it can also be interpreted as an optimization algorithm dedicated to the minimization of possibly nonconvex <span>(ell ^q)</span> penalized energies (with <span>(0&lt;q&lt;2)</span>). In the literature, this algorithm can be derived using various mathematical approaches, namely Half Quadratic Minimization, Majorization-Minimization, Alternating Minimization or Local Approximations. In this work, we will show how the Adaptive Ridge Algorithm can be simply derived and analyzed from a single equation, corresponding to a variational reformulation of the <span>(ell ^q)</span> penalty. We will describe in detail how the Adaptive Ridge Algorithm can be numerically implemented and we will perform a thorough experimental study of its parameters. We will also show how the variational formulation of the <span>(ell ^q)</span> penalty combined with modern duality principles can be used to design an interesting variant of the Adaptive Ridge Algorithm dedicated to the minimization of quadratic functions over (nonconvex) <span>(ell ^q)</span> balls.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"26 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing cure rate analysis through integration of machine learning models: a comparative study 通过整合机器学习模型加强治愈率分析:一项比较研究
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-25 DOI: 10.1007/s11222-024-10456-y
Wisdom Aselisewine, Suvra Pal

Cure rate models have been thoroughly investigated across various domains, encompassing medicine, reliability, and finance. The merging of machine learning (ML) with cure models is emerging as a promising strategy to improve predictive accuracy and gain profound insights into the underlying mechanisms influencing the probability of cure. The current body of literature has explored the benefits of incorporating a single ML algorithm with cure models. However, there is a notable absence of a comprehensive study that compares the performances of various ML algorithms in this context. This paper seeks to address and bridge this gap. Specifically, we focus on the well-known mixture cure model and examine the incorporation of five distinct ML algorithms: extreme gradient boosting, neural networks, support vector machines, random forests, and decision trees. To bolster the robustness of our comparison, we also include cure models with logistic and spline-based regression. For parameter estimation, we formulate an expectation maximization algorithm. A comprehensive simulation study is conducted across diverse scenarios to compare various models based on the accuracy and precision of estimates for different quantities of interest, along with the predictive accuracy of cure. The results derived from both the simulation study, as well as the analysis of real cutaneous melanoma data, indicate that the incorporation of ML models into cure model provides a beneficial contribution to the ongoing endeavors aimed at improving the accuracy of cure rate estimation.

治愈率模型已在医学、可靠性和金融等多个领域得到深入研究。将机器学习(ML)与治愈模型相结合,正在成为一种有前途的策略,可提高预测准确性,并深入了解影响治愈概率的潜在机制。目前已有大量文献探讨了将单一 ML 算法与治愈模型相结合的益处。然而,在这种情况下比较各种 ML 算法性能的综合研究却明显缺乏。本文试图解决并弥补这一空白。具体来说,我们将重点放在众所周知的混合治愈模型上,并研究了五种不同的 ML 算法:极梯度提升、神经网络、支持向量机、随机森林和决策树。为了增强比较的稳健性,我们还纳入了基于逻辑和样条回归的治愈模型。在参数估计方面,我们采用了期望最大化算法。我们在不同场景下进行了全面的模拟研究,根据不同相关数量的估计准确度和精确度以及治愈预测准确度对各种模型进行了比较。模拟研究和真实皮肤黑色素瘤数据分析得出的结果表明,将 ML 模型纳入治愈模型可为目前旨在提高治愈率估算准确性的工作做出有益贡献。
{"title":"Enhancing cure rate analysis through integration of machine learning models: a comparative study","authors":"Wisdom Aselisewine, Suvra Pal","doi":"10.1007/s11222-024-10456-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10456-y","url":null,"abstract":"<p>Cure rate models have been thoroughly investigated across various domains, encompassing medicine, reliability, and finance. The merging of machine learning (ML) with cure models is emerging as a promising strategy to improve predictive accuracy and gain profound insights into the underlying mechanisms influencing the probability of cure. The current body of literature has explored the benefits of incorporating a single ML algorithm with cure models. However, there is a notable absence of a comprehensive study that compares the performances of various ML algorithms in this context. This paper seeks to address and bridge this gap. Specifically, we focus on the well-known mixture cure model and examine the incorporation of five distinct ML algorithms: extreme gradient boosting, neural networks, support vector machines, random forests, and decision trees. To bolster the robustness of our comparison, we also include cure models with logistic and spline-based regression. For parameter estimation, we formulate an expectation maximization algorithm. A comprehensive simulation study is conducted across diverse scenarios to compare various models based on the accuracy and precision of estimates for different quantities of interest, along with the predictive accuracy of cure. The results derived from both the simulation study, as well as the analysis of real cutaneous melanoma data, indicate that the incorporation of ML models into cure model provides a beneficial contribution to the ongoing endeavors aimed at improving the accuracy of cure rate estimation.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"111 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaussian processes for Bayesian inverse problems associated with linear partial differential equations 与线性偏微分方程相关的贝叶斯逆问题的高斯过程
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-24 DOI: 10.1007/s11222-024-10452-2
Tianming Bai, Aretha L. Teckentrup, Konstantinos C. Zygalakis

This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inversion. We extend the framework of Raissi et. al. (2017) to construct PDE-informed Gaussian priors that we then use to construct different approximate posteriors. A number of different numerical experiments illustrate the superiority of the PDE-informed Gaussian priors over more traditional priors.

这项工作涉及使用高斯代用模型来解决与线性偏微分方程相关的贝叶斯逆问题。重点关注只有少量训练数据可用的情况。在这种情况下,所使用的高斯先验类型对于代用模型在贝叶斯反演方面的表现至关重要。我们扩展了 Raissi 等人(2017 年)的框架,构建了 PDE 信息高斯先验,然后用它来构建不同的近似后验。大量不同的数值实验表明,PDE-informed 高斯先验优于传统先验。
{"title":"Gaussian processes for Bayesian inverse problems associated with linear partial differential equations","authors":"Tianming Bai, Aretha L. Teckentrup, Konstantinos C. Zygalakis","doi":"10.1007/s11222-024-10452-2","DOIUrl":"https://doi.org/10.1007/s11222-024-10452-2","url":null,"abstract":"<p>This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inversion. We extend the framework of Raissi et. al. (2017) to construct PDE-informed Gaussian priors that we then use to construct different approximate posteriors. A number of different numerical experiments illustrate the superiority of the PDE-informed Gaussian priors over more traditional priors.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"196 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bounded-memory adjusted scores estimation in generalized linear models with large data sets 具有大型数据集的广义线性模型中的限界内存调整分数估计
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-21 DOI: 10.1007/s11222-024-10447-z
Patrick Zietkiewicz, Ioannis Kosmidis

The widespread use of maximum Jeffreys’-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (Biometrika 108:71–82, 2021. https://doi.org/10.1093/biomet/asaa052), who show that the resulting estimates are always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the size of the data set. In logistic regression, the implied adjusted score equations are formally bias-reducing in asymptotic frameworks with a fixed number of parameters and appear to deliver a substantial reduction in the persistent bias of the maximum likelihood estimator in high-dimensional settings where the number of parameters grows asymptotically as a proportion of the number of observations. In this work, we develop and present two new variants of iteratively reweighted least squares for estimating generalized linear models with adjusted score equations for mean bias reduction and maximization of the likelihood penalized by a positive power of the Jeffreys-prior penalty, which eliminate the requirement of storing O(n) quantities in memory, and can operate with data sets that exceed computer memory or even hard drive capacity. We achieve that through incremental QR decompositions, which enable IWLS iterations to have access only to data chunks of predetermined size. Both procedures can also be readily adapted to fit generalized linear models when distinct parts of the data is stored across different sites and, due to privacy concerns, cannot be fully transferred across sites. We assess the procedures through a real-data application with millions of observations.

Kosmidis 和 Firth(Biometrika 108:71-82,2021.https://doi.org/10.1093/biomet/asaa052)的研究结果表明,即使在最大似然估计值不是有限值的情况下,所得到的估计值也总是有限值的,这是一个实际问题,无论数据集的大小如何。这也支持了在二项式响应广义线性模型中,特别是在逻辑回归中广泛使用最大杰弗里斯先验惩罚似然法。在逻辑回归中,隐含的调整得分方程在参数数量固定的渐近框架中具有形式上的减偏性,并且在参数数量与观测值数量成比例渐近增长的高维环境中,似乎能大幅减少最大似然估计的持续偏差。在这项工作中,我们开发并提出了两种新的迭代加权最小二乘法变体,用于估计广义线性模型,其调整得分方程可减少平均偏差,并通过杰弗里斯-先验惩罚的正幂来惩罚似然最大化,从而消除了在内存中存储 O(n) 量的要求,并可在超过计算机内存甚至硬盘容量的数据集上运行。我们通过增量 QR 分解来实现这一点,这使得 IWLS 的迭代只能访问预定大小的数据块。当数据的不同部分存储在不同的站点,并且出于隐私考虑,无法在不同站点之间完全传输时,这两种程序也可以很容易地适应广义线性模型。我们通过一个拥有数百万观测数据的真实数据应用来评估这两种程序。
{"title":"Bounded-memory adjusted scores estimation in generalized linear models with large data sets","authors":"Patrick Zietkiewicz, Ioannis Kosmidis","doi":"10.1007/s11222-024-10447-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10447-z","url":null,"abstract":"<p>The widespread use of maximum Jeffreys’-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (Biometrika 108:71–82, 2021. https://doi.org/10.1093/biomet/asaa052), who show that the resulting estimates are always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the size of the data set. In logistic regression, the implied adjusted score equations are formally bias-reducing in asymptotic frameworks with a fixed number of parameters and appear to deliver a substantial reduction in the persistent bias of the maximum likelihood estimator in high-dimensional settings where the number of parameters grows asymptotically as a proportion of the number of observations. In this work, we develop and present two new variants of iteratively reweighted least squares for estimating generalized linear models with adjusted score equations for mean bias reduction and maximization of the likelihood penalized by a positive power of the Jeffreys-prior penalty, which eliminate the requirement of storing <i>O</i>(<i>n</i>) quantities in memory, and can operate with data sets that exceed computer memory or even hard drive capacity. We achieve that through incremental QR decompositions, which enable IWLS iterations to have access only to data chunks of predetermined size. Both procedures can also be readily adapted to fit generalized linear models when distinct parts of the data is stored across different sites and, due to privacy concerns, cannot be fully transferred across sites. We assess the procedures through a real-data application with millions of observations.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"16 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient workflow for modelling high-dimensional spatial extremes 建立高维空间极值模型的高效工作流程
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-19 DOI: 10.1007/s11222-024-10448-y
Silius M. Vandeskog, Sara Martino, Raphaël Huser

We develop a comprehensive methodological workflow for Bayesian modelling of high-dimensional spatial extremes that lets us describe both weakening extremal dependence at increasing levels and changes in the type of extremal dependence class as a function of the distance between locations. This is achieved with a latent Gaussian version of the spatial conditional extremes model that allows for computationally efficient inference with R-INLA. Inference is made more robust using a post hoc adjustment method that accounts for possible model misspecification. This added robustness makes it possible to extract more information from the available data during inference using a composite likelihood. The developed methodology is applied to the modelling of extreme hourly precipitation from high-resolution radar data in Norway. Inference is performed quickly, and the resulting model fit successfully captures the main trends in the extremal dependence structure of the data. The post hoc adjustment is found to further improve model performance.

我们为高维空间极值的贝叶斯建模开发了一套全面的方法论工作流程,使我们既能描述极值依赖性在水平增加时的减弱,又能描述极值依赖性类型随地点间距离的变化而变化。这是通过空间条件极值模型的潜在高斯版本实现的,该模型允许使用 R-INLA 进行高效计算推断。推论采用事后调整方法,考虑到可能出现的模型规范错误,从而使推论更加稳健。由于增加了稳健性,因此在使用复合似然法进行推理时,可以从可用数据中提取更多信息。所开发的方法被应用于根据挪威的高分辨率雷达数据建立极端小时降水量模型。推理过程很快,得出的拟合模型成功捕捉到了数据极端依赖结构的主要趋势。事后调整可进一步提高模型性能。
{"title":"An efficient workflow for modelling high-dimensional spatial extremes","authors":"Silius M. Vandeskog, Sara Martino, Raphaël Huser","doi":"10.1007/s11222-024-10448-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10448-y","url":null,"abstract":"<p>We develop a comprehensive methodological workflow for Bayesian modelling of high-dimensional spatial extremes that lets us describe both weakening extremal dependence at increasing levels and changes in the type of extremal dependence class as a function of the distance between locations. This is achieved with a latent Gaussian version of the spatial conditional extremes model that allows for computationally efficient inference with <span>R-INLA</span>. Inference is made more robust using a post hoc adjustment method that accounts for possible model misspecification. This added robustness makes it possible to extract more information from the available data during inference using a composite likelihood. The developed methodology is applied to the modelling of extreme hourly precipitation from high-resolution radar data in Norway. Inference is performed quickly, and the resulting model fit successfully captures the main trends in the extremal dependence structure of the data. The post hoc adjustment is found to further improve model performance.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"39 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics and Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1