Pub Date : 2024-04-15DOI: 10.1016/j.csda.2024.107961
José E. Chacón , Javier Fernández Serrano
The number of modes in a probability density function is representative of the complexity of a model and can also be viewed as the number of subpopulations. Despite its relevance, there has been limited research in this area. A novel approach to estimating the number of modes in the univariate setting is presented, focusing on prediction accuracy and inspired by some overlooked aspects of the problem: the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view that blends local and global density properties. The technique combines flexible kernel estimators and parsimonious compositional splines in the Bayesian inference paradigm, providing soft solutions and incorporating expert judgment. The procedure includes feature exploration, model selection, and mode testing, illustrated in a sports analytics case study showcasing multiple companion visualisation tools. A thorough simulation study also demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, the new method emerges as a top-tier alternative, offering innovative solutions for analysts.
{"title":"Bayesian taut splines for estimating the number of modes","authors":"José E. Chacón , Javier Fernández Serrano","doi":"10.1016/j.csda.2024.107961","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107961","url":null,"abstract":"<div><p>The number of modes in a probability density function is representative of the complexity of a model and can also be viewed as the number of subpopulations. Despite its relevance, there has been limited research in this area. A novel approach to estimating the number of modes in the univariate setting is presented, focusing on prediction accuracy and inspired by some overlooked aspects of the problem: the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view that blends local and global density properties. The technique combines flexible kernel estimators and parsimonious compositional splines in the Bayesian inference paradigm, providing soft solutions and incorporating expert judgment. The procedure includes feature exploration, model selection, and mode testing, illustrated in a sports analytics case study showcasing multiple companion visualisation tools. A thorough simulation study also demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, the new method emerges as a top-tier alternative, offering innovative solutions for analysts.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000458/pdfft?md5=9c9dde675ebe359be2107f0ce88120f0&pid=1-s2.0-S0167947324000458-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140605592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-10DOI: 10.1016/j.csda.2024.107930
Jiayu Qian , Yuanyuan Liu , Jingya Yang , Qingping Zhou
Bayesian inference with deep generative prior has received considerable interest for solving imaging inverse problems in many scientific and engineering fields. The selection of the prior distribution is learned from, and therefore an important representation learning of, available prior measurements. The SA-Roundtrip, a novel deep generative prior, is introduced to enable controlled sampling generation and identify the data's intrinsic dimension. This prior incorporates a self-attention structure within a bidirectional generative adversarial network. Subsequently, Bayesian inference is applied to the posterior distribution in the low-dimensional latent space using the Hamiltonian Monte Carlo with preconditioned Crank-Nicolson (HMC-pCN) algorithm, which is proven to be ergodic under specific conditions. Experiments conducted on computed tomography (CT) reconstruction with the MNIST and TomoPhantom datasets reveal that the proposed method outperforms state-of-the-art comparisons, consistently yielding a robust and superior point estimator along with precise uncertainty quantification.
{"title":"Bayesian imaging inverse problem with SA-Roundtrip prior via HMC-pCN sampler","authors":"Jiayu Qian , Yuanyuan Liu , Jingya Yang , Qingping Zhou","doi":"10.1016/j.csda.2024.107930","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107930","url":null,"abstract":"<div><p>Bayesian inference with deep generative prior has received considerable interest for solving imaging inverse problems in many scientific and engineering fields. The selection of the prior distribution is learned from, and therefore an important representation learning of, available prior measurements. The SA-Roundtrip, a novel deep generative prior, is introduced to enable controlled sampling generation and identify the data's intrinsic dimension. This prior incorporates a self-attention structure within a bidirectional generative adversarial network. Subsequently, Bayesian inference is applied to the posterior distribution in the low-dimensional latent space using the Hamiltonian Monte Carlo with preconditioned Crank-Nicolson (HMC-pCN) algorithm, which is proven to be ergodic under specific conditions. Experiments conducted on computed tomography (CT) reconstruction with the MNIST and TomoPhantom datasets reveal that the proposed method outperforms state-of-the-art comparisons, consistently yielding a robust and superior point estimator along with precise uncertainty quantification.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140555566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-08DOI: 10.1016/j.csda.2024.107959
Eric Koplin , Liliana Forzani , Diego Tomassi , Ruth M. Pfeiffer
Graphical models allow modeling of complex dependencies among components of a random vector. In many applications of graphical models, however, for example microbiome data, the data have an excess number of zero values. New pairwise graphical models with distributions in an exponential family are presented, that accommodate excess numbers of zeros in the random vector components. First these multivariate distributions are characterized in terms of univariate conditional distributions. Then predictors that arise from such a pairwise graphical model with excess zeros are modeled as functions of an outcome, and the corresponding first order sufficient dimension reduction (SDR) is derived. That is, linear combinations of the predictors that contain all the information for the regression of the outcome as a function of the predictors are obtained. To incorporate variable selection, the SDR is estimated using a pseudo-likelihood with a hierarchical penalty that prioritizes sparse interactions only for variables associated with the outcome. These methods yield consistent estimators of the reduction and can be applied to continuous or categorical outcomes. The new methods are then illustrated by studying normal, Poisson and truncated Poisson graphical models with excess zeros in simulations and by analyzing microbiome data from the American Gut Project. The models provided robust variable selection and the predictive performance of the Poisson zero-inflated pairwise graphical model was equal or better than that of other available methods for the analysis of microbiome data.
{"title":"Sufficient dimension reduction for a novel class of zero-inflated graphical models","authors":"Eric Koplin , Liliana Forzani , Diego Tomassi , Ruth M. Pfeiffer","doi":"10.1016/j.csda.2024.107959","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107959","url":null,"abstract":"<div><p>Graphical models allow modeling of complex dependencies among components of a random vector. In many applications of graphical models, however, for example microbiome data, the data have an excess number of zero values. New pairwise graphical models with distributions in an exponential family are presented, that accommodate excess numbers of zeros in the random vector components. First these multivariate distributions are characterized in terms of univariate conditional distributions. Then predictors that arise from such a pairwise graphical model with excess zeros are modeled as functions of an outcome, and the corresponding first order sufficient dimension reduction (SDR) is derived. That is, linear combinations of the predictors that contain all the information for the regression of the outcome as a function of the predictors are obtained. To incorporate variable selection, the SDR is estimated using a pseudo-likelihood with a hierarchical penalty that prioritizes sparse interactions only for variables associated with the outcome. These methods yield consistent estimators of the reduction and can be applied to continuous or categorical outcomes. The new methods are then illustrated by studying normal, Poisson and truncated Poisson graphical models with excess zeros in simulations and by analyzing microbiome data from the American Gut Project. The models provided robust variable selection and the predictive performance of the Poisson zero-inflated pairwise graphical model was equal or better than that of other available methods for the analysis of microbiome data.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140619396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-03DOI: 10.1016/j.csda.2024.107946
Wangshu Tu , Ryan Browne , Sanjeena Subedi
The logistic normal multinomial distribution is gaining interest in modelling microbiome data. It utilizes a hierarchical structure such that the observed counts conditional on the compositions are assumed to be multinomial random variables and the log-ratio transformed compositions are assumed to be from a Gaussian distribution. While multinomial distribution accounts for the compositional nature of the data, and a Gaussian prior offers flexibility in the structure of covariance matrices, the log-ratio transformed compositions of the microbiome data can be highly skewed, especially at a lower taxonomic level. Thus, a Gaussian distribution may not be an ideal prior for the log-ratio transformed compositions. A novel mixture of logistic skew-normal multinomial (LSNM) distribution is proposed in which a multivariate skew-normal distribution is utilized as a prior for the log-ratio transformed compositions. A variational Gaussian approximation in conjunction with the EM algorithm is utilized for parameter estimation.
逻辑正态多叉分布越来越受到微生物组数据建模的关注。它采用了一种分层结构,即以组成为条件的观测计数被假定为多二项随机变量,而对数比率变换后的组成被假定为高斯分布。虽然多叉分布说明了数据的组成性质,高斯先验也为协方差矩阵的结构提供了灵活性,但微生物组数据的对数比率转换组成可能高度偏斜,特别是在较低的分类水平上。因此,高斯分布可能不是对数比率变换成分的理想先验值。本文提出了一种新颖的逻辑偏态正态多叉(LSNM)分布混合物,利用多元偏态正态分布作为对数比率变换成分的先验。利用变异高斯近似和 EM 算法进行参数估计。
{"title":"A mixture of logistic skew-normal multinomial models","authors":"Wangshu Tu , Ryan Browne , Sanjeena Subedi","doi":"10.1016/j.csda.2024.107946","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107946","url":null,"abstract":"<div><p>The logistic normal multinomial distribution is gaining interest in modelling microbiome data. It utilizes a hierarchical structure such that the observed counts conditional on the compositions are assumed to be multinomial random variables and the log-ratio transformed compositions are assumed to be from a Gaussian distribution. While multinomial distribution accounts for the compositional nature of the data, and a Gaussian prior offers flexibility in the structure of covariance matrices, the log-ratio transformed compositions of the microbiome data can be highly skewed, especially at a lower taxonomic level. Thus, a Gaussian distribution may not be an ideal prior for the log-ratio transformed compositions. A novel mixture of logistic skew-normal multinomial (LSNM) distribution is proposed in which a multivariate skew-normal distribution is utilized as a prior for the log-ratio transformed compositions. A variational Gaussian approximation in conjunction with the EM algorithm is utilized for parameter estimation.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140607145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-27DOI: 10.1016/j.csda.2024.107958
Byungtae Seo , Il Do Ha
Accelerated failure time (AFT) models with random effects, a useful alternative to frailty models, have been widely used for analyzing clustered (or correlated) time-to-event data. In the AFT model, the distribution of the unobserved random effect is conventionally assumed to be parametric, often modeled as a normal distribution. Although it has been known that a misspecfied random-effect distribution has little effect on regression parameter estimates, in some cases, the impact caused by such misspecification is not negligible. Particularly when our focus extends to quantities associated with random effects, the problem could become worse. In this paper, we propose a semi-parametric maximum likelihood approach in which the random-effect distribution under the AFT models is left unspecified. We provide a feasible algorithm to estimate the random-effect distribution as well as model parameters. Through comprehensive simulation studies, our results demonstrate the effectiveness of this proposed method across a range of random-effect distribution types (discrete or continuous) and under conditions of heavy censoring. The efficacy of the approach is further illustrated through simulation studies and real-world data examples.
{"title":"Semiparametric accelerated failure time models under unspecified random effect distributions","authors":"Byungtae Seo , Il Do Ha","doi":"10.1016/j.csda.2024.107958","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107958","url":null,"abstract":"<div><p>Accelerated failure time (AFT) models with random effects, a useful alternative to frailty models, have been widely used for analyzing clustered (or correlated) time-to-event data. In the AFT model, the distribution of the unobserved random effect is conventionally assumed to be parametric, often modeled as a normal distribution. Although it has been known that a misspecfied random-effect distribution has little effect on regression parameter estimates, in some cases, the impact caused by such misspecification is not negligible. Particularly when our focus extends to quantities associated with random effects, the problem could become worse. In this paper, we propose a semi-parametric maximum likelihood approach in which the random-effect distribution under the AFT models is left unspecified. We provide a feasible algorithm to estimate the random-effect distribution as well as model parameters. Through comprehensive simulation studies, our results demonstrate the effectiveness of this proposed method across a range of random-effect distribution types (discrete or continuous) and under conditions of heavy censoring. The efficacy of the approach is further illustrated through simulation studies and real-world data examples.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140345092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-26DOI: 10.1016/j.csda.2024.107957
Wenting Liu , Huiqiong Li , Niansheng Tang , Jun Lyu
Interval-censored failure time data frequently occur in medical follow-up studies among others and include right-censored data as a special case. Their analysis is much difficult than the analysis of the right-censored data due to their much more complicated structures and no partial likelihood. This article presents a variational Bayesian (VB) approach for analyzing such data under a proportional hazards model. The VB approach obtains a direct approximation of the posterior density. Compared to the Markov chain Monte Carlo (MCMC)-based sampling approaches, the VB approach achieves enhanced computational efficiency without sacrificing estimation accuracy. An extensive simulation study is conducted to compare the performance of the proposed methods with two main Bayesian methods currently available in the literature and the classic proportional hazards model and indicates that they work well in practical situations.
{"title":"Variational Bayesian approach for analyzing interval-censored data under the proportional hazards model","authors":"Wenting Liu , Huiqiong Li , Niansheng Tang , Jun Lyu","doi":"10.1016/j.csda.2024.107957","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107957","url":null,"abstract":"<div><p>Interval-censored failure time data frequently occur in medical follow-up studies among others and include right-censored data as a special case. Their analysis is much difficult than the analysis of the right-censored data due to their much more complicated structures and no partial likelihood. This article presents a variational Bayesian (VB) approach for analyzing such data under a proportional hazards model. The VB approach obtains a direct approximation of the posterior density. Compared to the Markov chain Monte Carlo (MCMC)-based sampling approaches, the VB approach achieves enhanced computational efficiency without sacrificing estimation accuracy. An extensive simulation study is conducted to compare the performance of the proposed methods with two main Bayesian methods currently available in the literature and the classic proportional hazards model and indicates that they work well in practical situations.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140328151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-24DOI: 10.1016/j.csda.2024.107954
Seongoh Park , Joungyoun Kim , Xinlei Wang , Johan Lim
In multiple instance learning (MIL), a bag represents a sample that has a set of instances, each of which is described by a vector of explanatory variables, but the entire bag only has one label/response. Though many methods for MIL have been developed to date, few have paid attention to interpretability of models and results. The proposed Bayesian regression model stands on two levels of hierarchy, which transparently show how explanatory variables explain and instances contribute to bag responses. Moreover, two selection problems are simultaneously addressed; the instance selection to find out the instances in each bag responsible for the bag response, and the variable selection to search for the important covariates. To explore a joint discrete space of indicator variables created for selection of both explanatory variables and instances, the shotgun stochastic search algorithm is modified to fit in the MIL context. Also, the proposed model offers a natural and rigorous way to quantify uncertainty in coefficient estimation and outcome prediction, which many modern MIL applications call for. The simulation study shows the proposed regression model can select variables and instances with high performance (AUC greater than 0.86), thus predicting responses well. The proposed method is applied to the musk data for prediction of binding strengths (labels) between molecules (bags) with different conformations (instances) and target receptors. It outperforms all existing methods, and can identify variables relevant in modeling responses.
在多实例学习(Multiple instance learning,MIL)中,一个包代表一个样本,其中有一组实例,每个实例都由一个解释变量向量描述,但整个包只有一个标签/响应。虽然迄今为止已开发出许多 MIL 方法,但很少有人关注模型和结果的可解释性。所提出的贝叶斯回归模型分为两个层次,透明地显示了解释变量是如何解释和实例是如何促成袋响应的。此外,还同时解决了两个选择问题:一个是实例选择,以找出每个袋中对袋响应负责的实例;另一个是变量选择,以寻找重要的协变量。为了探索为选择解释变量和实例而创建的指标变量的联合离散空间,对猎枪随机搜索算法进行了修改,以适应 MIL 环境。此外,所提出的模型为量化系数估计和结果预测中的不确定性提供了一种自然而严谨的方法,而这正是许多现代 MIL 应用所需要的。模拟研究表明,所提出的回归模型可以选择性能较高的变量和实例(AUC 大于 0.86),从而很好地预测反应。所提出的方法被应用于麝香数据,用于预测不同构象的分子(袋)(实例)与目标受体之间的结合强度(标签)。该方法优于所有现有方法,并能识别与反应建模相关的变量。
{"title":"Variable selection in Bayesian multiple instance regression using shotgun stochastic search","authors":"Seongoh Park , Joungyoun Kim , Xinlei Wang , Johan Lim","doi":"10.1016/j.csda.2024.107954","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107954","url":null,"abstract":"<div><p>In multiple instance learning (MIL), a bag represents a sample that has a set of instances, each of which is described by a vector of explanatory variables, but the entire bag only has one label/response. Though many methods for MIL have been developed to date, few have paid attention to interpretability of models and results. The proposed Bayesian regression model stands on two levels of hierarchy, which transparently show how explanatory variables explain and instances contribute to bag responses. Moreover, two selection problems are simultaneously addressed; the instance selection to find out the instances in each bag responsible for the bag response, and the variable selection to search for the important covariates. To explore a joint discrete space of indicator variables created for selection of both explanatory variables and instances, the shotgun stochastic search algorithm is modified to fit in the MIL context. Also, the proposed model offers a natural and rigorous way to quantify uncertainty in coefficient estimation and outcome prediction, which many modern MIL applications call for. The simulation study shows the proposed regression model can select variables and instances with high performance (AUC greater than 0.86), thus predicting responses well. The proposed method is applied to the musk data for prediction of binding strengths (labels) between molecules (bags) with different conformations (instances) and target receptors. It outperforms all existing methods, and can identify variables relevant in modeling responses.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140351163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-24DOI: 10.1016/j.csda.2024.107956
Akira Okazaki , Shuichi Kawano
Multi-task learning (MTL) is a methodology that aims to improve the general performance of estimation and prediction by sharing common information among related tasks. In the MTL, there are several assumptions for the relationships and methods to incorporate them. One of the natural assumptions in the practical situation is that tasks are classified into some clusters with their characteristics. For this assumption, the group fused regularization approach performs clustering of the tasks by shrinking the difference among tasks. This enables the transfer of common information within the same cluster. However, this approach also transfers the information between different clusters, which worsens the estimation and prediction. To overcome this problem, an MTL method is proposed with a centroid parameter representing a cluster center of the task. Because this model separates parameters into the parameters for regression and the parameters for clustering, estimation and prediction accuracy for regression coefficient vectors are improved. The effectiveness of the proposed method is shown through Monte Carlo simulations and applications to real data.
{"title":"Multi-task learning regression via convex clustering","authors":"Akira Okazaki , Shuichi Kawano","doi":"10.1016/j.csda.2024.107956","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107956","url":null,"abstract":"<div><p>Multi-task learning (MTL) is a methodology that aims to improve the general performance of estimation and prediction by sharing common information among related tasks. In the MTL, there are several assumptions for the relationships and methods to incorporate them. One of the natural assumptions in the practical situation is that tasks are classified into some clusters with their characteristics. For this assumption, the group fused regularization approach performs clustering of the tasks by shrinking the difference among tasks. This enables the transfer of common information within the same cluster. However, this approach also transfers the information between different clusters, which worsens the estimation and prediction. To overcome this problem, an MTL method is proposed with a centroid parameter representing a cluster center of the task. Because this model separates parameters into the parameters for regression and the parameters for clustering, estimation and prediction accuracy for regression coefficient vectors are improved. The effectiveness of the proposed method is shown through Monte Carlo simulations and applications to real data.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000409/pdfft?md5=67ff220c9ae2e0cf144b79296e79f566&pid=1-s2.0-S0167947324000409-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140290730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-20DOI: 10.1016/j.csda.2024.107955
Diogo Pereira , Cláudia Nunes , Rui Rodrigues
The maximum likelihood problem for Hidden Markov Models is usually numerically solved by the Baum-Welch algorithm, which uses the Expectation-Maximization algorithm to find the estimates of the parameters. This algorithm has a recursion depth equal to the data sample size and cannot be computed in parallel, which limits the use of modern GPUs to speed up computation time. A new algorithm is proposed that provides the same estimates as the Baum-Welch algorithm, requiring about the same number of iterations, but is designed in such a way that it can be parallelized. As a consequence, it leads to a significant reduction in the computation time. This reduction is illustrated by means of numerical examples, where we consider simulated data as well as real datasets.
{"title":"A new algorithm for inference in HMM's with lower span complexity","authors":"Diogo Pereira , Cláudia Nunes , Rui Rodrigues","doi":"10.1016/j.csda.2024.107955","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107955","url":null,"abstract":"<div><p>The maximum likelihood problem for Hidden Markov Models is usually numerically solved by the Baum-Welch algorithm, which uses the Expectation-Maximization algorithm to find the estimates of the parameters. This algorithm has a recursion depth equal to the data sample size and cannot be computed in parallel, which limits the use of modern GPUs to speed up computation time. A new algorithm is proposed that provides the same estimates as the Baum-Welch algorithm, requiring about the same number of iterations, but is designed in such a way that it can be parallelized. As a consequence, it leads to a significant reduction in the computation time. This reduction is illustrated by means of numerical examples, where we consider simulated data as well as real datasets.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000392/pdfft?md5=f5b9ec83440b072fb6330eb5106ddb15&pid=1-s2.0-S0167947324000392-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140190987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-19DOI: 10.1016/j.csda.2024.107953
Simon M.S. Lo , Ralf A. Wilke , Takeshi Emura
Semiparametric Cox proportional hazards models enjoy great popularity in empirical survival analysis. A semiparametric model for cause-specific hazards under a proportionality restriction across risks is considered, which has desired practical properties such as estimation by partial likelihood and an analytical solution for the copula-graphic estimator. The cause-specific and marginal hazards are shown to share functional form restrictions in this case. The model for the cause-specific hazard can be used for inference about parametric restrictions on the marginal hazard without the risk of misspecifying the latter and without knowing the risk dependence. After the class of parametric marginal hazards has been determined, it can be estimated in conjunction with the degree of risk dependence. Finite sample properties are investigated with simulations. An application to employment duration demonstrates the practicality of the approach.
{"title":"A semiparametric model for the cause-specific hazard under risk proportionality","authors":"Simon M.S. Lo , Ralf A. Wilke , Takeshi Emura","doi":"10.1016/j.csda.2024.107953","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107953","url":null,"abstract":"<div><p>Semiparametric Cox proportional hazards models enjoy great popularity in empirical survival analysis. A semiparametric model for cause-specific hazards under a proportionality restriction across risks is considered, which has desired practical properties such as estimation by partial likelihood and an analytical solution for the copula-graphic estimator. The cause-specific and marginal hazards are shown to share functional form restrictions in this case. The model for the cause-specific hazard can be used for inference about parametric restrictions on the marginal hazard without the risk of misspecifying the latter and without knowing the risk dependence. After the class of parametric marginal hazards has been determined, it can be estimated in conjunction with the degree of risk dependence. Finite sample properties are investigated with simulations. An application to employment duration demonstrates the practicality of the approach.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000379/pdfft?md5=4bc502f302c73b799bcbf656f0393576&pid=1-s2.0-S0167947324000379-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140190988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}