首页 > 最新文献

Statistics and Computing最新文献

英文 中文
A limit formula and a series expansion for the bivariate Normal tail probability 双变量正态尾概率的极限公式和数列展开
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-08 DOI: 10.1007/s11222-024-10466-w
Siu-Kui Au

This work presents a limit formula for the bivariate Normal tail probability. It only requires the larger threshold to grow indefinitely, but otherwise has no restrictions on how the thresholds grow. The correlation parameter can change and possibly depend on the thresholds. The formula is applicable regardless of Salvage’s condition. Asymptotically, it reduces to Ruben’s formula and Hashorva’s formula under the corresponding conditions, and therefore can be considered a generalisation. Under a mild condition, it satisfies Plackett’s identity on the derivative with respect to the correlation parameter. Motivated by the limit formula, a series expansion is also obtained for the exact tail probability using derivatives of the univariate Mill’s ratio. Under similar conditions for the limit formula, the series converges and its truncated approximation has a small remainder term for large thresholds. To take advantage of this, a simple procedure is developed for the general case by remapping the parameters so that they satisfy the conditions. Examples are presented to illustrate the theoretical findings.

这项研究提出了双变量正态尾概率的极限公式。它只要求较大的临界值无限增长,除此之外对临界值的增长方式没有任何限制。相关参数可以改变,也可能取决于临界值。无论 Salvage 的条件如何,该公式都适用。在相应的条件下,它可以渐进地还原为鲁本公式和哈肖尔瓦公式,因此可以被视为一种概括。在一个温和的条件下,它满足普拉基特关于相关参数导数的特性。受极限公式的启发,利用单变量米尔比的导数也得到了精确尾概率的级数展开。在极限公式的类似条件下,数列收敛,其截断近似值在临界值较大时余项较小。为了利用这一点,我们针对一般情况开发了一个简单的程序,通过重新映射参数使其满足条件。本文将举例说明理论发现。
{"title":"A limit formula and a series expansion for the bivariate Normal tail probability","authors":"Siu-Kui Au","doi":"10.1007/s11222-024-10466-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10466-w","url":null,"abstract":"<p>This work presents a limit formula for the bivariate Normal tail probability. It only requires the larger threshold to grow indefinitely, but otherwise has no restrictions on how the thresholds grow. The correlation parameter can change and possibly depend on the thresholds. The formula is applicable regardless of Salvage’s condition. Asymptotically, it reduces to Ruben’s formula and Hashorva’s formula under the corresponding conditions, and therefore can be considered a generalisation. Under a mild condition, it satisfies Plackett’s identity on the derivative with respect to the correlation parameter. Motivated by the limit formula, a series expansion is also obtained for the exact tail probability using derivatives of the univariate Mill’s ratio. Under similar conditions for the limit formula, the series converges and its truncated approximation has a small remainder term for large thresholds. To take advantage of this, a simple procedure is developed for the general case by remapping the parameters so that they satisfy the conditions. Examples are presented to illustrate the theoretical findings.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classifier-dependent feature selection via greedy methods 通过贪婪方法进行分类器特征选择
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-06 DOI: 10.1007/s11222-024-10460-2
Fabiana Camattari, Sabrina Guastavino, Francesco Marchetti, Michele Piana, Emma Perracchione

The purpose of this study is to introduce a new approach to feature ranking for classification tasks, called in what follows greedy feature selection. In statistical learning, feature selection is usually realized by means of methods that are independent of the classifier applied to perform the prediction using that reduced number of features. Instead, the greedy feature selection identifies the most important feature at each step and according to the selected classifier. The benefits of such scheme are investigated in terms of model capacity indicators, such as the Vapnik-Chervonenkis dimension or the kernel alignment. This theoretical study proves that the iterative greedy algorithm is able to construct classifiers whose complexity capacity grows at each step. The proposed method is then tested numerically on various datasets and compared to the state-of-the-art techniques. The results show that our iterative scheme is able to truly capture only a few relevant features, and may improve, especially for real and noisy data, the accuracy scores of other techniques. The greedy scheme is also applied to the challenging application of predicting geo-effective manifestations of the active Sun.

本研究的目的是为分类任务的特征排序引入一种新方法,下文称之为 "贪婪特征选择"。在统计学习中,特征选择通常是通过独立于分类器的方法来实现的,分类器使用减少的特征数量进行预测。相反,贪婪特征选择在每一步都会根据所选分类器确定最重要的特征。我们从模型容量指标(如 Vapnik-Chervonenkis 维度或内核对齐度)的角度研究了这种方案的优势。这项理论研究证明,迭代贪婪算法能够构建复杂度容量每一步都在增长的分类器。然后,我们在各种数据集上对所提出的方法进行了数值测试,并与最先进的技术进行了比较。结果表明,我们的迭代方案能够真正捕捉到少数几个相关特征,并能提高其他技术的准确率,尤其是在真实和高噪声数据中。贪婪方案还被应用于预测活跃太阳的地理效应表现这一具有挑战性的应用中。
{"title":"Classifier-dependent feature selection via greedy methods","authors":"Fabiana Camattari, Sabrina Guastavino, Francesco Marchetti, Michele Piana, Emma Perracchione","doi":"10.1007/s11222-024-10460-2","DOIUrl":"https://doi.org/10.1007/s11222-024-10460-2","url":null,"abstract":"<p>The purpose of this study is to introduce a new approach to feature ranking for classification tasks, called in what follows greedy feature selection. In statistical learning, feature selection is usually realized by means of methods that are independent of the classifier applied to perform the prediction using that reduced number of features. Instead, the greedy feature selection identifies the most important feature at each step and according to the selected classifier. The benefits of such scheme are investigated in terms of model capacity indicators, such as the Vapnik-Chervonenkis dimension or the kernel alignment. This theoretical study proves that the iterative greedy algorithm is able to construct classifiers whose complexity capacity grows at each step. The proposed method is then tested numerically on various datasets and compared to the state-of-the-art techniques. The results show that our iterative scheme is able to truly capture only a few relevant features, and may improve, especially for real and noisy data, the accuracy scores of other techniques. The greedy scheme is also applied to the challenging application of predicting geo-effective manifestations of the active Sun.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Locally sparse and robust partial least squares in scalar-on-function regression 标量函数回归中的局部稀疏稳健偏最小二乘法
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-06 DOI: 10.1007/s11222-024-10464-y
Sude Gurer, Han Lin Shang, Abhijit Mandal, Ufuk Beyaztas

We present a novel approach for estimating a scalar-on-function regression model, leveraging a functional partial least squares methodology. Our proposed method involves computing the functional partial least squares components through sparse partial robust M regression, facilitating robust and locally sparse estimations of the regression coefficient function. This strategy delivers a robust decomposition for the functional predictor and regression coefficient functions. After the decomposition, model parameters are estimated using a weighted loss function, incorporating robustness through iterative reweighting of the partial least squares components. The robust decomposition feature of our proposed method enables the robust estimation of model parameters in the scalar-on-function regression model, ensuring reliable predictions in the presence of outliers and leverage points. Moreover, it accurately identifies zero and nonzero sub-regions where the slope function is estimated, even in the presence of outliers and leverage points. We assess our proposed method’s estimation and predictive performance through a series of Monte Carlo experiments and an empirical dataset—that is, data collected in relation to oriented strand board. Compared to existing methods our proposed method performs favorably. Notably, our robust procedure exhibits superior performance in the presence of outliers while maintaining competitiveness in their absence. Our method has been implemented in the robsfplsr package in .

我们提出了一种利用函数偏最小二乘法估计函数标量回归模型的新方法。我们提出的方法包括通过稀疏偏稳健 M 回归计算函数偏最小二乘分量,从而促进对回归系数函数进行稳健的局部稀疏估计。这一策略可对函数预测和回归系数函数进行稳健分解。分解后,使用加权损失函数对模型参数进行估计,通过对偏最小二乘分量进行迭代重新加权来实现稳健性。我们所提出的方法的稳健分解功能能够稳健地估计标量-函数回归模型中的模型参数,确保在存在异常值和杠杆点的情况下做出可靠的预测。此外,即使在存在异常值和杠杆点的情况下,它也能准确识别出斜率函数估计值为零和非零的子区域。我们通过一系列蒙特卡罗实验和一个经验数据集(即收集的与定向刨花板有关的数据)来评估我们提出的方法的估计和预测性能。与现有方法相比,我们提出的方法表现出色。值得注意的是,我们的稳健程序在存在异常值的情况下表现出卓越的性能,而在没有异常值的情况下也能保持竞争力。我们的方法已在.NET Framework 3.0的 robsfplsr 软件包中实现。
{"title":"Locally sparse and robust partial least squares in scalar-on-function regression","authors":"Sude Gurer, Han Lin Shang, Abhijit Mandal, Ufuk Beyaztas","doi":"10.1007/s11222-024-10464-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10464-y","url":null,"abstract":"<p>We present a novel approach for estimating a scalar-on-function regression model, leveraging a functional partial least squares methodology. Our proposed method involves computing the functional partial least squares components through sparse partial robust M regression, facilitating robust and locally sparse estimations of the regression coefficient function. This strategy delivers a robust decomposition for the functional predictor and regression coefficient functions. After the decomposition, model parameters are estimated using a weighted loss function, incorporating robustness through iterative reweighting of the partial least squares components. The robust decomposition feature of our proposed method enables the robust estimation of model parameters in the scalar-on-function regression model, ensuring reliable predictions in the presence of outliers and leverage points. Moreover, it accurately identifies zero and nonzero sub-regions where the slope function is estimated, even in the presence of outliers and leverage points. We assess our proposed method’s estimation and predictive performance through a series of Monte Carlo experiments and an empirical dataset—that is, data collected in relation to oriented strand board. Compared to existing methods our proposed method performs favorably. Notably, our robust procedure exhibits superior performance in the presence of outliers while maintaining competitiveness in their absence. Our method has been implemented in the <span>robsfplsr</span> package in .</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Shapley performance attribution for least-squares regression 最小二乘回归的高效夏普利性能归因
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-04 DOI: 10.1007/s11222-024-10459-9
Logan Bell, Nikhil Devanathan, Stephen Boyd

We consider the performance of a least-squares regression model, as judged by out-of-sample (R^2). Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.

我们通过样本外 (R^2)来判断最小二乘回归模型的性能。考虑到特征之间的相互依存关系,夏普利值可以将模型的性能公平地归因于其输入特征。精确评估夏普利值需要解决大量回归问题,而回归问题的数量与特征数量成指数关系,因此通常使用蒙特卡罗式近似方法。我们将重点放在最小二乘回归模型的特殊情况上,在这种情况下,可以使用几种技巧来高效计算和评估回归模型。这些技巧大大加快了计算速度,可以评估更多的蒙特卡罗样本,从而获得更高的精度。我们将这种方法称为最小二乘沙普利性能归因(LS-SPA),并介绍了我们的开源实现。
{"title":"Efficient Shapley performance attribution for least-squares regression","authors":"Logan Bell, Nikhil Devanathan, Stephen Boyd","doi":"10.1007/s11222-024-10459-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10459-9","url":null,"abstract":"<p>We consider the performance of a least-squares regression model, as judged by out-of-sample <span>(R^2)</span>. Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On weak convergence of quantile-based empirical likelihood process for ROC curves 论 ROC 曲线基于量化的经验似然过程的弱收敛性
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-04 DOI: 10.1007/s11222-024-10457-x
Hu Jiang, Liu Yiming, Zhou Wang

The empirical likelihood (EL) method possesses desirable qualities such as automatically determining confidence regions and circumventing the need for variance estimation. As an extension, a quantile-based EL (QEL) method is considered, which results in a simpler form. In this paper, we explore the framework of the QEL method. Firstly, we explore the weak convergence of the −2 log empirical likelihood ratio for ROC curves. We also introduce a novel statistic for testing the entire ROC curve and the equality of two distributions. To validate our approach, we conduct simulation studies and analyze real data from hepatitis C patients, comparing our method with existing ones.

经验似然法(EL)具有自动确定置信区域和无需方差估计等优点。作为扩展,我们考虑了一种基于量值的 EL(QEL)方法,它的形式更为简单。本文将探讨 QEL 方法的框架。首先,我们探讨了 ROC 曲线的-2 对数经验似然比的弱收敛性。我们还引入了一种新的统计量,用于测试整个 ROC 曲线和两个分布的相等性。为了验证我们的方法,我们进行了模拟研究,并分析了丙型肝炎患者的真实数据,将我们的方法与现有方法进行了比较。
{"title":"On weak convergence of quantile-based empirical likelihood process for ROC curves","authors":"Hu Jiang, Liu Yiming, Zhou Wang","doi":"10.1007/s11222-024-10457-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10457-x","url":null,"abstract":"<p>The empirical likelihood (EL) method possesses desirable qualities such as automatically determining confidence regions and circumventing the need for variance estimation. As an extension, a quantile-based EL (QEL) method is considered, which results in a simpler form. In this paper, we explore the framework of the QEL method. Firstly, we explore the weak convergence of the −2 log empirical likelihood ratio for ROC curves. We also introduce a novel statistic for testing the entire ROC curve and the equality of two distributions. To validate our approach, we conduct simulation studies and analyze real data from hepatitis C patients, comparing our method with existing ones.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: analysis of stochastic gradient descent in continuous time 更正:连续时间内的随机梯度下降分析
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-02 DOI: 10.1007/s11222-024-10450-4
Jonas Latz

A correction regarding [Latz 2021, Stat. Comput. 31, 39].

关于[Latz 2021, Stat. Comput. 31, 39]的更正。
{"title":"Correction to: analysis of stochastic gradient descent in continuous time","authors":"Jonas Latz","doi":"10.1007/s11222-024-10450-4","DOIUrl":"https://doi.org/10.1007/s11222-024-10450-4","url":null,"abstract":"<p>A correction regarding [Latz 2021, Stat. Comput. 31, 39].</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systemic infinitesimal over-dispersion on graphical dynamic models 图形动态模型上的系统无穷小过度分散
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-02 DOI: 10.1007/s11222-024-10443-3
Ning Ning, Edward Ionides

Stochastic models for collections of interacting populations have crucial roles in many scientific fields such as epidemiology, ecology, performance engineering, and queueing theory, to name a few. However, the standard approach to extending an ordinary differential equation model to a Markov chain does not have sufficient flexibility in the mean-variance relationship to match data. To handle that, we develop new approaches using Dirichlet noise to construct collections of independent or dependent noise processes. This permits the modeling of high-frequency variation in transition rates both within and between the populations under study. Our theory is developed in a general framework of time-inhomogeneous Markov processes equipped with a general graphical structure. We demonstrate our approach on a widely analyzed measles dataset, adding Dirichlet noise to a classical Susceptible–Exposed–Infected–Recovered model. Our methodology shows improved statistical fit measured by log-likelihood and provides new insights into the dynamics of this biological system.

在许多科学领域,如流行病学、生态学、性能工程和排队理论等,相互作用种群集合的随机模型发挥着至关重要的作用。然而,将常微分方程模型扩展为马尔可夫链的标准方法在均值-方差关系上没有足够的灵活性来匹配数据。为此,我们开发了使用 Dirichlet 噪声的新方法,以构建独立或从属噪声过程的集合。这样就可以对所研究种群内部和种群之间过渡率的高频变化进行建模。我们的理论是在具有一般图形结构的时间同构马尔可夫过程的一般框架中发展起来的。我们在一个广泛分析的麻疹数据集上演示了我们的方法,在经典的 "易感-暴露-感染-恢复 "模型中加入了 Dirichlet 噪声。我们的方法改进了以对数似然测量的统计拟合度,并为这一生物系统的动力学提供了新的见解。
{"title":"Systemic infinitesimal over-dispersion on graphical dynamic models","authors":"Ning Ning, Edward Ionides","doi":"10.1007/s11222-024-10443-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10443-3","url":null,"abstract":"<p>Stochastic models for collections of interacting populations have crucial roles in many scientific fields such as epidemiology, ecology, performance engineering, and queueing theory, to name a few. However, the standard approach to extending an ordinary differential equation model to a Markov chain does not have sufficient flexibility in the mean-variance relationship to match data. To handle that, we develop new approaches using Dirichlet noise to construct collections of independent or dependent noise processes. This permits the modeling of high-frequency variation in transition rates both within and between the populations under study. Our theory is developed in a general framework of time-inhomogeneous Markov processes equipped with a general graphical structure. We demonstrate our approach on a widely analyzed measles dataset, adding Dirichlet noise to a classical Susceptible–Exposed–Infected–Recovered model. Our methodology shows improved statistical fit measured by log-likelihood and provides new insights into the dynamics of this biological system.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Greedy recursive spectral bisection for modularity-bound hierarchical divisive community detection 用于模块化约束分层分裂群落检测的贪婪递归光谱分段法
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-27 DOI: 10.1007/s11222-024-10451-3
Douglas O. Cardoso, João Domingos Gomes da Silva Junior, Carla Silva Oliveira, Celso Marques, Laura Silva de Assis

Spectral clustering techniques depend on the eigenstructure of a similarity matrix to assign data points to clusters, so that points within the same cluster exhibit high similarity and are compared to those in different clusters. This work aimed to develop a spectral method that could be compared to clustering algorithms that represent the current state of the art. This investigation conceived a novel spectral clustering method, as well as five policies that guide its execution, based on spectral graph theory and embodying hierarchical clustering principles. Computational experiments comparing the proposed method with six state-of-the-art algorithms were undertaken in this study to evaluate the clustering methods under scrutiny. The assessment was performed using two evaluation metrics, specifically the adjusted Rand index, and modularity. The obtained results furnish compelling evidence, indicating that the proposed method is competitive and possesses distinctive properties compared to those elucidated in the existing literature. This suggests that our approach stands as a viable alternative, offering a robust choice within the spectrum of available same-purpose tools.

光谱聚类技术依赖于相似性矩阵的特征结构来将数据点分配到聚类中,从而使同一聚类中的点表现出较高的相似性,并与不同聚类中的点进行比较。这项工作旨在开发一种可与代表当前技术水平的聚类算法进行比较的光谱方法。这项研究以谱图理论为基础,体现了分层聚类原理,构思了一种新颖的谱聚类方法,以及指导其执行的五种策略。本研究将所提出的方法与六种最先进的算法进行了计算实验比较,以评估所研究的聚类方法。评估使用了两个评价指标,特别是调整后的兰德指数和模块性。所获得的结果提供了令人信服的证据,表明与现有文献中阐明的方法相比,所提出的方法具有竞争力和独特性。这表明,我们的方法是一种可行的替代方法,在现有的同用途工具中提供了一种稳健的选择。
{"title":"Greedy recursive spectral bisection for modularity-bound hierarchical divisive community detection","authors":"Douglas O. Cardoso, João Domingos Gomes da Silva Junior, Carla Silva Oliveira, Celso Marques, Laura Silva de Assis","doi":"10.1007/s11222-024-10451-3","DOIUrl":"https://doi.org/10.1007/s11222-024-10451-3","url":null,"abstract":"<p>Spectral clustering techniques depend on the eigenstructure of a similarity matrix to assign data points to clusters, so that points within the same cluster exhibit high similarity and are compared to those in different clusters. This work aimed to develop a spectral method that could be compared to clustering algorithms that represent the current state of the art. This investigation conceived a novel spectral clustering method, as well as five policies that guide its execution, based on spectral graph theory and embodying hierarchical clustering principles. Computational experiments comparing the proposed method with six state-of-the-art algorithms were undertaken in this study to evaluate the clustering methods under scrutiny. The assessment was performed using two evaluation metrics, specifically the adjusted Rand index, and modularity. The obtained results furnish compelling evidence, indicating that the proposed method is competitive and possesses distinctive properties compared to those elucidated in the existing literature. This suggests that our approach stands as a viable alternative, offering a robust choice within the spectrum of available same-purpose tools.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible Bayesian quantile regression based on the generalized asymmetric Huberised-type distribution 基于广义非对称胡贝利兹型分布的灵活贝叶斯量化回归
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-27 DOI: 10.1007/s11222-024-10453-1
Weitao Hu, Weiping Zhang

To enhance the robustness and flexibility of Bayesian quantile regression models using the asymmetric Laplace or asymmetric Huberised-type (AH) distribution, which lacks changeable mode, diminishing influence of outliers, and asymmetry under median regression, we propose a new generalized AH distribution which is achieved through a hierarchical mixture representation, thus leading to a flexible Bayesian Huberised quantile regression framework. With many parameters in the model, we develop an efficient Markov chain Monte Carlo procedure based on the Metropolis-within-Gibbs sampling algorithm. The robustness and flexibility of the new distribution are examined through intensive simulation studies and application to two real data sets.

非对称拉普拉斯或非对称休伯里化型(AH)分布缺乏可变模式、离群值影响减弱和中位回归下的非对称性,为了增强使用该分布的贝叶斯量化回归模型的稳健性和灵活性,我们提出了一种新的广义 AH 分布,该分布通过分层混合表示来实现,从而产生了一种灵活的贝叶斯休伯里化量化回归框架。由于模型中有许多参数,我们基于 Metropolis-within-Gibbs 采样算法开发了一种高效的马尔可夫链蒙特卡罗程序。通过深入的模拟研究和对两个真实数据集的应用,检验了新分布的稳健性和灵活性。
{"title":"Flexible Bayesian quantile regression based on the generalized asymmetric Huberised-type distribution","authors":"Weitao Hu, Weiping Zhang","doi":"10.1007/s11222-024-10453-1","DOIUrl":"https://doi.org/10.1007/s11222-024-10453-1","url":null,"abstract":"<p>To enhance the robustness and flexibility of Bayesian quantile regression models using the asymmetric Laplace or asymmetric Huberised-type (AH) distribution, which lacks changeable mode, diminishing influence of outliers, and asymmetry under median regression, we propose a new generalized AH distribution which is achieved through a hierarchical mixture representation, thus leading to a flexible Bayesian Huberised quantile regression framework. With many parameters in the model, we develop an efficient Markov chain Monte Carlo procedure based on the Metropolis-within-Gibbs sampling algorithm. The robustness and flexibility of the new distribution are examined through intensive simulation studies and application to two real data sets.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structured prior distributions for the covariance matrix in latent factor models 潜因模型中协方差矩阵的结构化先验分布
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-06-26 DOI: 10.1007/s11222-024-10454-0
Sarah Elizabeth Heaps, Ian Hyla Jermyn

Factor models are widely used for dimension reduction in the analysis of multivariate data. This is achieved through decomposition of a (p times p) covariance matrix into the sum of two components. Through a latent factor representation, they can be interpreted as a diagonal matrix of idiosyncratic variances and a shared variation matrix, that is, the product of a (p times k) factor loadings matrix and its transpose. If (k ll p), this defines a parsimonious factorisation of the covariance matrix. Historically, little attention has been paid to incorporating prior information in Bayesian analyses using factor models where, at best, the prior for the factor loadings is order invariant. In this work, a class of structured priors is developed that can encode ideas of dependence structure about the shared variation matrix. The construction allows data-informed shrinkage towards sensible parametric structures while also facilitating inference over the number of factors. Using an unconstrained reparameterisation of stationary vector autoregressions, the methodology is extended to stationary dynamic factor models. For computational inference, parameter-expanded Markov chain Monte Carlo samplers are proposed, including an efficient adaptive Gibbs sampler. Two substantive applications showcase the scope of the methodology and its inferential benefits.

因子模型在多变量数据分析中被广泛用于降维。这是通过将协方差矩阵分解为两个分量之和来实现的。通过潜在因子表示法,它们可以被解释为一个特异性方差的对角矩阵和一个共享变异矩阵,即一个(p 乘以 k)因子载荷矩阵与其转置的乘积。如果 (k ll p), 这就定义了协方差矩阵的合理因子化。在使用因子模型进行贝叶斯分析时,因子载荷的先验信息充其量是阶不变的。在这项工作中,我们开发了一类结构化先验,它可以编码共享变异矩阵的依赖结构思想。这种结构允许根据数据对合理的参数结构进行缩减,同时也便于对因子数量进行推断。利用对静态向量自回归的无约束重参数化,该方法被扩展到静态动态因子模型。在计算推断方面,提出了参数扩展的马尔科夫链蒙特卡罗采样器,包括高效的自适应吉布斯采样器。两个实质性应用展示了该方法的范围及其推理优势。
{"title":"Structured prior distributions for the covariance matrix in latent factor models","authors":"Sarah Elizabeth Heaps, Ian Hyla Jermyn","doi":"10.1007/s11222-024-10454-0","DOIUrl":"https://doi.org/10.1007/s11222-024-10454-0","url":null,"abstract":"<p>Factor models are widely used for dimension reduction in the analysis of multivariate data. This is achieved through decomposition of a <span>(p times p)</span> covariance matrix into the sum of two components. Through a latent factor representation, they can be interpreted as a diagonal matrix of idiosyncratic variances and a shared variation matrix, that is, the product of a <span>(p times k)</span> factor loadings matrix and its transpose. If <span>(k ll p)</span>, this defines a parsimonious factorisation of the covariance matrix. Historically, little attention has been paid to incorporating prior information in Bayesian analyses using factor models where, at best, the prior for the factor loadings is order invariant. In this work, a class of structured priors is developed that can encode ideas of dependence structure about the shared variation matrix. The construction allows data-informed shrinkage towards sensible parametric structures while also facilitating inference over the number of factors. Using an unconstrained reparameterisation of stationary vector autoregressions, the methodology is extended to stationary dynamic factor models. For computational inference, parameter-expanded Markov chain Monte Carlo samplers are proposed, including an efficient adaptive Gibbs sampler. Two substantive applications showcase the scope of the methodology and its inferential benefits.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics and Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1