首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Medoid splits for efficient random forests in metric spaces 度量空间中高效随机森林的 Medoid 分裂
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-31 DOI: 10.1016/j.csda.2024.107995
Matthieu Bulté , Helle Sørensen

An adaptation of the random forest algorithm for Fréchet regression is revisited, addressing the challenge of regression with random objects in metric spaces. To overcome the limitations of previous approaches, a new splitting rule is introduced, substituting the computationally expensive Fréchet means with a medoid-based approach. The asymptotic equivalence of this method to Fréchet mean-based procedures is demonstrated, along with the consistency of the associated regression estimator. This approach provides a sound theoretical framework and a more efficient computational solution to Fréchet regression, broadening its application to non-standard data types and complex use cases.

本文重新审视了用于弗雷谢回归的随机森林算法的改编,以解决在度量空间中使用随机对象进行回归的难题。为了克服以往方法的局限性,本文引入了一种新的分割规则,用基于中间值的方法取代了计算成本高昂的弗雷谢特均值法。该方法与基于弗雷谢特均值的程序的渐近等价性以及相关回归估计器的一致性得到了证明。这种方法为弗雷谢特回归提供了合理的理论框架和更有效的计算解决方案,将其应用范围扩大到非标准数据类型和复杂的使用案例。
{"title":"Medoid splits for efficient random forests in metric spaces","authors":"Matthieu Bulté ,&nbsp;Helle Sørensen","doi":"10.1016/j.csda.2024.107995","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107995","url":null,"abstract":"<div><p>An adaptation of the random forest algorithm for Fréchet regression is revisited, addressing the challenge of regression with random objects in metric spaces. To overcome the limitations of previous approaches, a new splitting rule is introduced, substituting the computationally expensive Fréchet means with a medoid-based approach. The asymptotic equivalence of this method to Fréchet mean-based procedures is demonstrated, along with the consistency of the associated regression estimator. This approach provides a sound theoretical framework and a more efficient computational solution to Fréchet regression, broadening its application to non-standard data types and complex use cases.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107995"},"PeriodicalIF":1.5,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000793/pdfft?md5=90ce48cb2e6d039f213ac81b5b60098d&pid=1-s2.0-S0167947324000793-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistent skinny Gibbs in probit regression 概率回归中的一致瘦吉布斯
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-27 DOI: 10.1016/j.csda.2024.107993
Jiarong Ouyang, Xuan Cao

Spike and slab priors have emerged as effective and computationally scalable tools for Bayesian variable selection in high-dimensional linear regression. However, the crucial model selection consistency and efficient computational strategies using spike and slab priors in probit regression have rarely been investigated. A hierarchical probit model with continuous spike and slab priors over regression coefficients is considered, and a highly scalable Gibbs sampler with a computational complexity that grows only linearly in the dimension of predictors is proposed. Specifically, the “Skinny Gibbs” algorithm is adapted to the setting of probit and negative binomial regression and model selection consistency for the proposed method under probit model is established, when the number of covariates is allowed to grow much larger than the sample size. Through simulation studies, the method is shown to achieve superior empirical performance compared with other state-of-the art methods. Gene expression data from 51 asthmatic and 44 non-asthmatic samples are analyzed and the performance for predicting asthma using the proposed approach is compared with existing approaches.

尖峰和板块先验已成为高维线性回归中贝叶斯变量选择的有效且可扩展计算的工具。然而,在 probit 回归中使用尖峰和板块先验的关键模型选择一致性和高效计算策略却鲜有研究。本文考虑了对回归系数具有连续尖峰和板块先验的分层 probit 模型,并提出了一种具有高度可扩展性的吉布斯采样器,其计算复杂度仅随预测维度线性增长。具体地说,"Skinny Gibbs "算法适用于 probit 和负二项回归,当协变量的数量远大于样本量时,建立了拟议方法在 probit 模型下的模型选择一致性。通过模拟研究表明,与其他最先进的方法相比,该方法具有更优越的经验性能。对 51 个哮喘样本和 44 个非哮喘样本的基因表达数据进行了分析,并将拟议方法与现有方法预测哮喘的性能进行了比较。
{"title":"Consistent skinny Gibbs in probit regression","authors":"Jiarong Ouyang,&nbsp;Xuan Cao","doi":"10.1016/j.csda.2024.107993","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107993","url":null,"abstract":"<div><p>Spike and slab priors have emerged as effective and computationally scalable tools for Bayesian variable selection in high-dimensional linear regression. However, the crucial model selection consistency and efficient computational strategies using spike and slab priors in probit regression have rarely been investigated. A hierarchical probit model with continuous spike and slab priors over regression coefficients is considered, and a highly scalable Gibbs sampler with a computational complexity that grows only linearly in the dimension of predictors is proposed. Specifically, the “Skinny Gibbs” algorithm is adapted to the setting of probit and negative binomial regression and model selection consistency for the proposed method under probit model is established, when the number of covariates is allowed to grow much larger than the sample size. Through simulation studies, the method is shown to achieve superior empirical performance compared with other state-of-the art methods. Gene expression data from 51 asthmatic and 44 non-asthmatic samples are analyzed and the performance for predicting asthma using the proposed approach is compared with existing approaches.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107993"},"PeriodicalIF":1.8,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141243339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online bootstrap inference for the geometric median 几何中值的在线引导推断
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-23 DOI: 10.1016/j.csda.2024.107992
Guanghui Cheng , Qiang Xiong , Ruitao Lin

In real-world applications, the geometric median is a natural quantity to consider for robust inference of location or central tendency, particularly when dealing with non-standard or irregular data distributions. An innovative online bootstrap inference algorithm, using the averaged nonlinear stochastic gradient algorithm, is proposed to make statistical inference about the geometric median from massive datasets. The method is computationally fast and memory-friendly, and it is easy to update as new data is received sequentially. The validity of the proposed online bootstrap inference is theoretically justified. Simulation studies under a variety of scenarios are conducted to demonstrate its effectiveness and efficiency in terms of computation speed and memory usage. Additionally, the online inference procedure is applied to a large publicly available dataset for skin segmentation.

在现实世界的应用中,几何中值是稳健推断位置或中心倾向时需要考虑的一个自然量,尤其是在处理非标准或不规则数据分布时。本文提出了一种创新的在线引导推断算法,利用平均非线性随机梯度算法,从海量数据集中对几何中值进行统计推断。该方法计算速度快、内存友好,而且在连续收到新数据时易于更新。所提出的在线自举推断方法的有效性在理论上得到了证明。在各种情况下进行的仿真研究证明了该方法在计算速度和内存使用方面的有效性和效率。此外,在线推断程序还被应用于一个大型公开数据集的皮肤分割。
{"title":"Online bootstrap inference for the geometric median","authors":"Guanghui Cheng ,&nbsp;Qiang Xiong ,&nbsp;Ruitao Lin","doi":"10.1016/j.csda.2024.107992","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107992","url":null,"abstract":"<div><p>In real-world applications, the geometric median is a natural quantity to consider for robust inference of location or central tendency, particularly when dealing with non-standard or irregular data distributions. An innovative online bootstrap inference algorithm, using the averaged nonlinear stochastic gradient algorithm, is proposed to make statistical inference about the geometric median from massive datasets. The method is computationally fast and memory-friendly, and it is easy to update as new data is received sequentially. The validity of the proposed online bootstrap inference is theoretically justified. Simulation studies under a variety of scenarios are conducted to demonstrate its effectiveness and efficiency in terms of computation speed and memory usage. Additionally, the online inference procedure is applied to a large publicly available dataset for skin segmentation.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"197 ","pages":"Article 107992"},"PeriodicalIF":1.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spectral co-clustering in multi-layer directed networks 多层定向网络中的光谱协同聚类
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-23 DOI: 10.1016/j.csda.2024.107987
Wenqing Su , Xiao Guo , Xiangyu Chang , Ying Yang

Modern network analysis often involves multi-layer network data in which the nodes are aligned, and the edges on each layer represent one of the multiple relations among the nodes. Current literature on multi-layer network data is mostly limited to undirected relations. However, direct relations are more common and may introduce extra information. This study focuses on community detection (or clustering) in multi-layer directed networks. To take into account the asymmetry, a novel spectral-co-clustering-based algorithm is developed to detect co-clusters, which capture the sending patterns and receiving patterns of nodes, respectively. Specifically, the eigendecomposition of the debiased sum of Gram matrices over the layer-wise adjacency matrices is computed, followed by the k-means, where the sum of Gram matrices is used to avoid possible cancellation of clusters caused by direct summation. Theoretical analysis of the algorithm under the multi-layer stochastic co-block model is provided, where the common assumption that the cluster number is coupled with the rank of the model is relaxed. After a systematic analysis of the eigenvectors of the population version algorithm, the misclassification rates are derived, which show that multi-layers would bring benefits to the clustering performance. The experimental results of simulated data corroborate the theoretical predictions, and the analysis of a real-world trade network dataset provides interpretable results.

现代网络分析通常涉及多层网络数据,其中节点是对齐的,每一层的边代表节点之间的多种关系之一。目前关于多层网络数据的文献大多局限于无向关系。然而,直接关系更为常见,并可能引入额外的信息。本研究侧重于多层有向网络中的社群检测(或聚类)。考虑到非对称性,本研究开发了一种基于光谱聚类的新型算法来检测共聚类,分别捕捉节点的发送模式和接收模式。具体来说,先计算层向邻接矩阵上的格兰矩阵去重和的eigendecomposition,然后进行k-means,其中使用格兰矩阵和来避免直接求和可能造成的簇取消。对多层随机共块模型下的算法进行了理论分析,其中放宽了聚类数与模型秩耦合的常见假设。在对群体版算法的特征向量进行系统分析后,得出了误分类率,这表明多层算法会给聚类性能带来好处。模拟数据的实验结果证实了理论预测,对现实世界贸易网络数据集的分析也提供了可解释的结果。
{"title":"Spectral co-clustering in multi-layer directed networks","authors":"Wenqing Su ,&nbsp;Xiao Guo ,&nbsp;Xiangyu Chang ,&nbsp;Ying Yang","doi":"10.1016/j.csda.2024.107987","DOIUrl":"10.1016/j.csda.2024.107987","url":null,"abstract":"<div><p>Modern network analysis often involves multi-layer network data in which the nodes are aligned, and the edges on each layer represent one of the multiple relations among the nodes. Current literature on multi-layer network data is mostly limited to undirected relations. However, direct relations are more common and may introduce extra information. This study focuses on community detection (or clustering) in multi-layer directed networks. To take into account the asymmetry, a novel spectral-co-clustering-based algorithm is developed to detect <em>co-clusters</em>, which capture the sending patterns and receiving patterns of nodes, respectively. Specifically, the eigendecomposition of the <em>debiased</em> sum of Gram matrices over the layer-wise adjacency matrices is computed, followed by the <em>k</em>-means, where the sum of Gram matrices is used to avoid possible cancellation of clusters caused by direct summation. Theoretical analysis of the algorithm under the multi-layer stochastic co-block model is provided, where the common assumption that the cluster number is coupled with the rank of the model is relaxed. After a systematic analysis of the eigenvectors of the population version algorithm, the misclassification rates are derived, which show that multi-layers would bring benefits to the clustering performance. The experimental results of simulated data corroborate the theoretical predictions, and the analysis of a real-world trade network dataset provides interpretable results.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107987"},"PeriodicalIF":1.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141132464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A conditional approach for regression analysis of case K interval-censored failure time data with informative censoring 对带有信息普查的 K 例间隔删失故障时间数据进行回归分析的条件方法
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-23 DOI: 10.1016/j.csda.2024.107991
Mingyue Du , Xingqiu Zhao

This paper discusses regression analysis of case K interval-censored failure time data, a general type of failure time data, in the presence of informative censoring with the focus on simultaneous variable selection and estimation. Although many authors have considered the challenging variable selection problem for interval-censored data, most of the existing methods assume independent or non-informative censoring. More importantly, the existing methods that allow for informative censoring are frailty model-based approaches and cannot directly assess the degree of informative censoring among other shortcomings. To address these, we propose a conditional approach and develop a penalized sieve maximum likelihood procedure for the simultaneous variable selection and estimation of covariate effects. Furthermore, we establish the oracle property of the proposed method and illustrate the appropriateness and usefulness of the approach using a simulation study. Finally we apply the proposed method to a set of real data on Alzheimer's disease and provide some new insights.

本文讨论了 K 例区间删失故障时间数据(一种常见的故障时间数据)在有信息删失情况下的回归分析,重点是同时进行变量选择和估计。尽管许多学者都考虑过区间删失数据的变量选择问题,但现有的大多数方法都假设了独立或非信息删失。更重要的是,现有的允许信息剔除的方法都是基于虚弱模型的方法,不能直接评估信息剔除的程度等缺点。为了解决这些问题,我们提出了一种有条件的方法,并开发了一种惩罚性筛最大似然程序,用于同时选择变量和估计协变量效应。此外,我们还建立了所提方法的甲骨文属性,并通过模拟研究说明了该方法的适当性和实用性。最后,我们将提出的方法应用于一组关于阿尔茨海默病的真实数据,并提出了一些新的见解。
{"title":"A conditional approach for regression analysis of case K interval-censored failure time data with informative censoring","authors":"Mingyue Du ,&nbsp;Xingqiu Zhao","doi":"10.1016/j.csda.2024.107991","DOIUrl":"10.1016/j.csda.2024.107991","url":null,"abstract":"<div><p>This paper discusses regression analysis of case <em>K</em> interval-censored failure time data, a general type of failure time data, in the presence of informative censoring with the focus on simultaneous variable selection and estimation. Although many authors have considered the challenging variable selection problem for interval-censored data, most of the existing methods assume independent or non-informative censoring. More importantly, the existing methods that allow for informative censoring are frailty model-based approaches and cannot directly assess the degree of informative censoring among other shortcomings. To address these, we propose a conditional approach and develop a penalized sieve maximum likelihood procedure for the simultaneous variable selection and estimation of covariate effects. Furthermore, we establish the oracle property of the proposed method and illustrate the appropriateness and usefulness of the approach using a simulation study. Finally we apply the proposed method to a set of real data on Alzheimer's disease and provide some new insights.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107991"},"PeriodicalIF":1.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141133396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principal component analysis for zero-inflated compositional data 零膨胀成分数据的主成分分析
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-21 DOI: 10.1016/j.csda.2024.107989
Kipoong Kim , Jaesung Park , Sungkyu Jung

Recent advances in DNA sequencing technology have led to a growing interest in microbiome data. Since the data are often high-dimensional, there is a clear need for dimensionality reduction. However, the compositional nature and zero-inflation of microbiome data present many challenges in developing new methodologies. New PCA methods for zero-inflated compositional data are presented, based on a novel framework called principal compositional subspace. These methods aim to identify both the principal compositional subspace and the corresponding principal scores that best approximate the given data, ensuring that their reconstruction remains within the compositional simplex. To this end, the constrained optimization problems are established and alternating minimization algorithms are provided to solve the problems. The theoretical properties of the principal compositional subspace, particularly focusing on its existence and consistency, are further investigated. Simulation studies have demonstrated that the methods achieve lower reconstruction errors than the existing log-ratio PCA in the presence of a linear pattern and have shown comparable performance in a curved pattern. The methods have been applied to four microbiome compositional datasets with excessive zeros, successfully recovering the underlying low-rank structure.

DNA 测序技术的最新进展使得人们对微生物组数据的兴趣与日俱增。由于数据通常是高维的,因此显然需要降维。然而,微生物组数据的组成性质和零膨胀性给开发新方法带来了许多挑战。本文基于一个称为主成分子空间的新框架,介绍了用于零膨胀成分数据的新 PCA 方法。这些方法旨在找出最接近给定数据的主成分子空间和相应的主分数,确保它们的重构保持在成分单纯形内。为此,建立了约束优化问题,并提供了交替最小化算法来解决这些问题。此外,还进一步研究了主组成子空间的理论特性,特别是其存在性和一致性。模拟研究表明,与现有的对数比率 PCA 相比,这些方法在线性模式下的重建误差更小,在曲线模式下的性能相当。这些方法已应用于四个零点过多的微生物组成分数据集,成功地恢复了底层的低秩结构。
{"title":"Principal component analysis for zero-inflated compositional data","authors":"Kipoong Kim ,&nbsp;Jaesung Park ,&nbsp;Sungkyu Jung","doi":"10.1016/j.csda.2024.107989","DOIUrl":"10.1016/j.csda.2024.107989","url":null,"abstract":"<div><p>Recent advances in DNA sequencing technology have led to a growing interest in microbiome data. Since the data are often high-dimensional, there is a clear need for dimensionality reduction. However, the compositional nature and zero-inflation of microbiome data present many challenges in developing new methodologies. New PCA methods for zero-inflated compositional data are presented, based on a novel framework called principal compositional subspace. These methods aim to identify both the principal compositional subspace and the corresponding principal scores that best approximate the given data, ensuring that their reconstruction remains within the compositional simplex. To this end, the constrained optimization problems are established and alternating minimization algorithms are provided to solve the problems. The theoretical properties of the principal compositional subspace, particularly focusing on its existence and consistency, are further investigated. Simulation studies have demonstrated that the methods achieve lower reconstruction errors than the existing log-ratio PCA in the presence of a linear pattern and have shown comparable performance in a curved pattern. The methods have been applied to four microbiome compositional datasets with excessive zeros, successfully recovering the underlying low-rank structure.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107989"},"PeriodicalIF":1.8,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141130855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gibbs sampler approach for objective Bayesian inference in elliptical multivariate meta-analysis random effects model 椭圆多元荟萃随机效应模型中客观贝叶斯推断的吉布斯采样器方法
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-20 DOI: 10.1016/j.csda.2024.107990
Olha Bodnar , Taras Bodnar

Bayesian inference procedures for the parameters of the multivariate random effects model are derived under the assumption of an elliptically contoured distribution when the Berger and Bernardo reference and the Jeffreys priors are assigned to the model parameters. A new numerical algorithm for drawing samples from the posterior distribution is developed, which is based on the hybrid Gibbs sampler. The new approach is compared to the two Metropolis-Hastings algorithms previously derived in the literature via an extensive simulation study. The findings are applied to a Bayesian multivariate meta-analysis, conducted using the results of ten studies on the effectiveness of a treatment for hypertension. The analysis investigates the treatment effects on systolic and diastolic blood pressure. The second empirical illustration deals with measurement data from the CCAUV.V-K1 key comparison, aiming to compare measurement results of sinusoidal linear accelerometers at four frequencies.

在椭圆轮廓分布的假设条件下,为多元随机效应模型的参数导出了贝叶斯推断程序,即为模型参数指定 Berger 和 Bernardo 参考先验和 Jeffreys 先验。在混合吉布斯采样器的基础上,开发了一种从后验分布中抽取样本的新数值算法。通过广泛的模拟研究,将新方法与之前在文献中得出的两种 Metropolis-Hastings 算法进行了比较。研究结果被应用于贝叶斯多元荟萃分析,该分析使用了十项关于高血压治疗效果的研究结果。分析调查了治疗对收缩压和舒张压的影响。第二个实证说明涉及 CCAUV.V-K1 关键比较的测量数据,目的是比较正弦线性加速度计在四种频率下的测量结果。
{"title":"Gibbs sampler approach for objective Bayesian inference in elliptical multivariate meta-analysis random effects model","authors":"Olha Bodnar ,&nbsp;Taras Bodnar","doi":"10.1016/j.csda.2024.107990","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107990","url":null,"abstract":"<div><p>Bayesian inference procedures for the parameters of the multivariate random effects model are derived under the assumption of an elliptically contoured distribution when the Berger and Bernardo reference and the Jeffreys priors are assigned to the model parameters. A new numerical algorithm for drawing samples from the posterior distribution is developed, which is based on the hybrid Gibbs sampler. The new approach is compared to the two Metropolis-Hastings algorithms previously derived in the literature via an extensive simulation study. The findings are applied to a Bayesian multivariate meta-analysis, conducted using the results of ten studies on the effectiveness of a treatment for hypertension. The analysis investigates the treatment effects on systolic and diastolic blood pressure. The second empirical illustration deals with measurement data from the CCAUV.V-K1 key comparison, aiming to compare measurement results of sinusoidal linear accelerometers at four frequencies.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"197 ","pages":"Article 107990"},"PeriodicalIF":1.8,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000744/pdfft?md5=f03345bd15314ef0a3bf57ae49fa38db&pid=1-s2.0-S0167947324000744-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141084457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Goodness–of–fit tests based on the min–characteristic function 基于最小特征函数的拟合优度检验
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-16 DOI: 10.1016/j.csda.2024.107988
S.G. Meintanis , B. Milošević , M.D. Jiménez–Gamero

Tests of fit for classes of distributions that include the Weibull, the Pareto and the Fréchet families are proposed. The new tests employ the novel tool of the min–characteristic function and are based on an L2–type weighted distance between this function and its empirical counterpart applied on suitably standardized data. If data–standardization is performed using the MLE of the distributional parameters then the method reduces to testing for the standard member of the family, with parameter values known and set equal to one. Asymptotic properties of the tests are investigated. A Monte Carlo study is presented that includes the new procedure as well as competitors for the purpose of specification testing with three extreme value distributions. The new tests are also applied on a few real–data sets.

提出了对包括魏布勒、帕累托和弗雷谢家族在内的各类分布的拟合度测试。新测试采用了新颖的最小特征函数工具,并基于该函数与应用于适当标准化数据的经验对应函数之间的 L2 型加权距离。如果使用分布参数的 MLE 进行数据标准化,那么该方法就简化为对已知参数值并设为 1 的标准族成员进行检验。对测试的渐近特性进行了研究。本文介绍了蒙特卡罗研究,其中包括新程序以及竞争对手使用三种极值分布进行规范测试的情况。新的检验还应用于一些真实数据集。
{"title":"Goodness–of–fit tests based on the min–characteristic function","authors":"S.G. Meintanis ,&nbsp;B. Milošević ,&nbsp;M.D. Jiménez–Gamero","doi":"10.1016/j.csda.2024.107988","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107988","url":null,"abstract":"<div><p>Tests of fit for classes of distributions that include the Weibull, the Pareto and the Fréchet families are proposed. The new tests employ the novel tool of the min–characteristic function and are based on an <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>–type weighted distance between this function and its empirical counterpart applied on suitably standardized data. If data–standardization is performed using the MLE of the distributional parameters then the method reduces to testing for the standard member of the family, with parameter values known and set equal to one. Asymptotic properties of the tests are investigated. A Monte Carlo study is presented that includes the new procedure as well as competitors for the purpose of specification testing with three extreme value distributions. The new tests are also applied on a few real–data sets.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"197 ","pages":"Article 107988"},"PeriodicalIF":1.8,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000720/pdfft?md5=345ca757392dc6a128ee30fc0f6964c2&pid=1-s2.0-S0167947324000720-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141072605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rank-based sequential feature selection for high-dimensional accelerated failure time models with main and interaction effects 基于等级的序列特征选择,用于具有主效应和交互效应的高维加速故障时间模型
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-13 DOI: 10.1016/j.csda.2024.107978
Ke Yu, Shan Luo

High-dimensional accelerated failure time (AFT) models are commonly used regression models in survival analysis. Feature selection problem in high-dimensional AFT models is addressed, considering scenarios involving solely main effects or encompassing both main and interaction effects. A rank-based sequential feature selection (RankSFS) method is proposed, the selection consistency is established and illustrated by comparing it with existing methods through extensive numerical simulations. The results show that RankSFS achieves a higher Positive Discovery Rate (PDR) and lower False Discovery Rate (FDR). Additionally, RankSFS is applied to analyze the data on Breast Cancer Relapse. With a remarkable short computational time, RankSFS successfully identifies two crucial genes.

高维加速失效时间(AFT)模型是生存分析中常用的回归模型。考虑到只涉及主效应或同时包含主效应和交互效应的情况,本文探讨了高维 AFT 模型中的特征选择问题。本文提出了一种基于秩的序列特征选择(RankSFS)方法,并通过大量数值模拟将其与现有方法进行比较,从而确定了选择的一致性。结果表明,RankSFS 实现了更高的正发现率(PDR)和更低的误发现率(FDR)。此外,RankSFS 还被用于分析乳腺癌复发数据。在极短的计算时间内,RankSFS 成功识别了两个关键基因。
{"title":"Rank-based sequential feature selection for high-dimensional accelerated failure time models with main and interaction effects","authors":"Ke Yu,&nbsp;Shan Luo","doi":"10.1016/j.csda.2024.107978","DOIUrl":"10.1016/j.csda.2024.107978","url":null,"abstract":"<div><p>High-dimensional accelerated failure time (AFT) models are commonly used regression models in survival analysis. Feature selection problem in high-dimensional AFT models is addressed, considering scenarios involving solely main effects or encompassing both main and interaction effects. A rank-based sequential feature selection (RankSFS) method is proposed, the selection consistency is established and illustrated by comparing it with existing methods through extensive numerical simulations. The results show that RankSFS achieves a higher Positive Discovery Rate (PDR) and lower False Discovery Rate (FDR). Additionally, RankSFS is applied to analyze the data on Breast Cancer Relapse. With a remarkable short computational time, RankSFS successfully identifies two crucial genes.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"197 ","pages":"Article 107978"},"PeriodicalIF":1.8,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141027872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A switching state-space transmission model for tracking epidemics and assessing interventions 用于追踪流行病和评估干预措施的切换状态空间传播模型
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-09 DOI: 10.1016/j.csda.2024.107977
Jingxue Feng, Liangliang Wang

The effective control of infectious diseases relies on accurate assessment of the impact of interventions, which is often hindered by the complex dynamics of the spread of disease. A Beta-Dirichlet switching state-space transmission model is proposed to track underlying dynamics of disease and evaluate the effectiveness of interventions simultaneously. As time evolves, the switching mechanism introduced in the susceptible-exposed-infected-recovered (SEIR) model is able to capture the timing and magnitude of changes in the transmission rate due to the effectiveness of control measures. The implementation of this model is based on a particle Markov Chain Monte Carlo algorithm, which can estimate the time evolution of SEIR states, switching states, and high-dimensional parameters efficiently. The efficacy of the proposed model and estimation procedure are demonstrated through simulation studies. With a real-world application to British Columbia's COVID-19 outbreak, the proposed switching state-space transmission model quantifies the reduction of transmission rate following interventions. The proposed model provides a promising tool to inform public health policies aimed at studying the underlying dynamics and evaluating the effectiveness of interventions during the spread of the disease.

传染病的有效控制有赖于对干预措施效果的准确评估,而这往往受到疾病传播复杂动态的阻碍。本文提出了一种 Beta-Dirichlet 切换状态空间传播模型,用于跟踪疾病的基本动态并同时评估干预措施的效果。随着时间的推移,易感-暴露-感染-恢复(SEIR)模型中引入的切换机制能够捕捉到因控制措施的有效性而导致的传播率变化的时间和幅度。该模型的实现基于粒子马尔可夫链蒙特卡洛算法,该算法能有效估计 SEIR 状态、切换状态和高维参数的时间演化。通过模拟研究,证明了所提模型和估算程序的有效性。通过对不列颠哥伦比亚省 COVID-19 疫情的实际应用,提出的切换状态空间传播模型量化了干预措施后传播率的降低。所提出的模型为公共卫生政策提供了一个很有前景的工具,旨在研究疾病传播过程中的基本动态并评估干预措施的有效性。
{"title":"A switching state-space transmission model for tracking epidemics and assessing interventions","authors":"Jingxue Feng,&nbsp;Liangliang Wang","doi":"10.1016/j.csda.2024.107977","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107977","url":null,"abstract":"<div><p>The effective control of infectious diseases relies on accurate assessment of the impact of interventions, which is often hindered by the complex dynamics of the spread of disease. A Beta-Dirichlet switching state-space transmission model is proposed to track underlying dynamics of disease and evaluate the effectiveness of interventions simultaneously. As time evolves, the switching mechanism introduced in the susceptible-exposed-infected-recovered (SEIR) model is able to capture the timing and magnitude of changes in the transmission rate due to the effectiveness of control measures. The implementation of this model is based on a particle Markov Chain Monte Carlo algorithm, which can estimate the time evolution of SEIR states, switching states, and high-dimensional parameters efficiently. The efficacy of the proposed model and estimation procedure are demonstrated through simulation studies. With a real-world application to British Columbia's COVID-19 outbreak, the proposed switching state-space transmission model quantifies the reduction of transmission rate following interventions. The proposed model provides a promising tool to inform public health policies aimed at studying the underlying dynamics and evaluating the effectiveness of interventions during the spread of the disease.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"197 ","pages":"Article 107977"},"PeriodicalIF":1.8,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000616/pdfft?md5=2ef429fe1ac8d3ce2c054c514b5fee1b&pid=1-s2.0-S0167947324000616-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140947674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1