首页 > 最新文献

Computational Statistics最新文献

英文 中文
Generalized linear model based on latent factors and supervised components 基于潜在因素和监督成分的广义线性模型
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-02 DOI: 10.1007/s00180-024-01544-8
Julien Gibaud, Xavier Bry, Catherine Trottier

In a context of component-based multivariate modeling we propose to model the residual dependence of the responses. Each response of a response vector is assumed to depend, through a Generalized Linear Model, on a set of explanatory variables. The vast majority of explanatory variables are partitioned into conceptually homogeneous variable groups, viewed as explanatory themes. Variables in themes are supposed many and some of them are highly correlated or even collinear. Thus, generalized linear regression demands dimension reduction and regularization with respect to each theme. Besides them, we consider a small set of “additional” covariates not conceptually linked to the themes, and demanding no regularization. Supervised Component Generalized Linear Regression proposed to both regularize and reduce the dimension of the explanatory space by searching each theme for an appropriate number of orthogonal components, which both contribute to predict the responses and capture relevant structural information in themes. In this paper, we introduce random latent variables (a.k.a. factors) so as to model the covariance matrix of the linear predictors of the responses conditional on the components. To estimate the model, we present an algorithm combining supervised component-based model estimation with factor model estimation. This methodology is tested on simulated data and then applied to an agricultural ecology dataset.

在基于成分的多元建模中,我们建议对响应的残差依赖性进行建模。通过广义线性模型,假设响应向量的每个响应都取决于一组解释变量。绝大多数解释变量被划分为概念上同质的变量组,被视为解释主题。主题中的变量应该很多,其中一些变量高度相关,甚至相互关联。因此,广义线性回归要求对每个主题进行降维和正则化处理。除此之外,我们还考虑了一小部分 "附加 "协变量,这些协变量与主题没有概念上的联系,也不需要正则化。监督成分广义线性回归(Supervised Component Generalized Linear Regression)建议,通过在每个主题中搜索适当数量的正交成分来规整和降低解释空间的维度,这些正交成分既有助于预测反应,又能捕捉主题中的相关结构信息。在本文中,我们引入了随机潜变量(又称因子),从而建立以成分为条件的响应线性预测因子协方差矩阵模型。为了估计模型,我们提出了一种算法,将基于成分的监督模型估计与因子模型估计相结合。该方法在模拟数据上进行了测试,然后应用于农业生态数据集。
{"title":"Generalized linear model based on latent factors and supervised components","authors":"Julien Gibaud, Xavier Bry, Catherine Trottier","doi":"10.1007/s00180-024-01544-8","DOIUrl":"https://doi.org/10.1007/s00180-024-01544-8","url":null,"abstract":"<p>In a context of component-based multivariate modeling we propose to model the residual dependence of the responses. Each response of a response vector is assumed to depend, through a Generalized Linear Model, on a set of explanatory variables. The vast majority of explanatory variables are partitioned into conceptually homogeneous variable groups, viewed as explanatory themes. Variables in themes are supposed many and some of them are highly correlated or even collinear. Thus, generalized linear regression demands dimension reduction and regularization with respect to each theme. Besides them, we consider a small set of “additional” covariates not conceptually linked to the themes, and demanding no regularization. Supervised Component Generalized Linear Regression proposed to both regularize and reduce the dimension of the explanatory space by searching each theme for an appropriate number of orthogonal components, which both contribute to predict the responses and capture relevant structural information in themes. In this paper, we introduce random latent variables (a.k.a. factors) so as to model the covariance matrix of the linear predictors of the responses conditional on the components. To estimate the model, we present an algorithm combining supervised component-based model estimation with factor model estimation. This methodology is tested on simulated data and then applied to an agricultural ecology dataset.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"33 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SpICE: an interpretable method for spatial data SpICE:空间数据的可解释方法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-26 DOI: 10.1007/s00180-024-01538-6
Natalia da Silva, Ignacio Alvarez-Castro, Leonardo Moreno, Andrés Sosa

Statistical learning methods are widely utilised in tackling complex problems due to their flexibility, good predictive performance and ability to capture complex relationships among variables. Additionally, recently developed automatic workflows have provided a standardised approach for implementing statistical learning methods across various applications. However, these tools highlight one of the main drawbacks of statistical learning: the lack of interpretability of the results. In the past few years, a large amount of research has been focused on methods for interpreting black box models. Having interpretable statistical learning methods is necessary for obtaining a deeper understanding of these models. Specifically in problems in which spatial information is relevant, combining interpretable methods with spatial data can help to provide a better understanding of the problem and an improved interpretation of the results. This paper is focused on the individual conditional expectation plot (ICE-plot), a model-agnostic method for interpreting statistical learning models and combining them with spatial information. An ICE-plot extension is proposed in which spatial information is used as a restriction to define spatial ICE (SpICE) curves. Spatial ICE curves are estimated using real data in the context of an economic problem concerning property valuation in Montevideo, Uruguay. Understanding the key factors that influence property valuation is essential for decision-making, and spatial data play a relevant role in this regard.

统计学习方法具有灵活性、良好的预测性能和捕捉变量间复杂关系的能力,因此被广泛用于解决复杂问题。此外,最近开发的自动工作流程为在各种应用中实施统计学习方法提供了标准化方法。然而,这些工具突出了统计学习的一个主要缺点:结果缺乏可解释性。在过去几年中,大量研究都集中在解释黑盒模型的方法上。拥有可解释的统计学习方法对于深入理解这些模型非常必要。特别是在与空间信息相关的问题中,将可解释的方法与空间数据相结合,有助于更好地理解问题和改进对结果的解释。本文的重点是个体条件期望图(ICE-plot),这是一种与模型无关的方法,用于解释统计学习模型并将其与空间信息相结合。本文提出了 ICE-plot 的扩展,其中空间信息被用作定义空间 ICE(SpICE)曲线的限制条件。在乌拉圭蒙得维的亚,利用真实数据估算了空间 ICE 曲线。了解影响房地产估价的关键因素对决策至关重要,而空间数据在这方面发挥着重要作用。
{"title":"SpICE: an interpretable method for spatial data","authors":"Natalia da Silva, Ignacio Alvarez-Castro, Leonardo Moreno, Andrés Sosa","doi":"10.1007/s00180-024-01538-6","DOIUrl":"https://doi.org/10.1007/s00180-024-01538-6","url":null,"abstract":"<p>Statistical learning methods are widely utilised in tackling complex problems due to their flexibility, good predictive performance and ability to capture complex relationships among variables. Additionally, recently developed automatic workflows have provided a standardised approach for implementing statistical learning methods across various applications. However, these tools highlight one of the main drawbacks of statistical learning: the lack of interpretability of the results. In the past few years, a large amount of research has been focused on methods for interpreting black box models. Having interpretable statistical learning methods is necessary for obtaining a deeper understanding of these models. Specifically in problems in which spatial information is relevant, combining interpretable methods with spatial data can help to provide a better understanding of the problem and an improved interpretation of the results. This paper is focused on the individual conditional expectation plot (ICE-plot), a model-agnostic method for interpreting statistical learning models and combining them with spatial information. An ICE-plot extension is proposed in which spatial information is used as a restriction to define spatial ICE (SpICE) curves. Spatial ICE curves are estimated using real data in the context of an economic problem concerning property valuation in Montevideo, Uruguay. Understanding the key factors that influence property valuation is essential for decision-making, and spatial data play a relevant role in this regard.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"58 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of evaluation metrics for classification in imbalanced data 不平衡数据分类评价指标的性能
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-24 DOI: 10.1007/s00180-024-01539-5
Alex de la Cruz Huayanay, Jorge L. Bazán, Cibele M. Russo

This paper investigates the effectiveness of various metrics for selecting the adequate model for binary classification when data is imbalanced. Through an extensive simulation study involving 12 commonly used metrics of classification, our findings indicate that the Matthews Correlation Coefficient, G-Mean, and Cohen’s kappa consistently yield favorable performance. Conversely, the area under the curve and Accuracy metrics demonstrate poor performance across all studied scenarios, while other seven metrics exhibit varying degrees of effectiveness in specific scenarios. Furthermore, we discuss a practical application in the financial area, which confirms the robust performance of these metrics in facilitating model selection among alternative link functions.

本文研究了在数据不平衡的情况下,为二元分类选择适当模型的各种指标的有效性。通过涉及 12 个常用分类指标的广泛模拟研究,我们的研究结果表明,马修斯相关系数、G-均值和科恩卡帕一直都能产生良好的性能。相反,曲线下面积和准确度指标在所有研究场景中都表现不佳,而其他七个指标在特定场景中表现出不同程度的有效性。此外,我们还讨论了金融领域的一个实际应用,该应用证实了这些指标在促进从备选链接函数中选择模型方面的强大性能。
{"title":"Performance of evaluation metrics for classification in imbalanced data","authors":"Alex de la Cruz Huayanay, Jorge L. Bazán, Cibele M. Russo","doi":"10.1007/s00180-024-01539-5","DOIUrl":"https://doi.org/10.1007/s00180-024-01539-5","url":null,"abstract":"<p>This paper investigates the effectiveness of various metrics for selecting the adequate model for binary classification when data is imbalanced. Through an extensive simulation study involving 12 commonly used metrics of classification, our findings indicate that the Matthews Correlation Coefficient, G-Mean, and Cohen’s kappa consistently yield favorable performance. Conversely, the area under the curve and Accuracy metrics demonstrate poor performance across all studied scenarios, while other seven metrics exhibit varying degrees of effectiveness in specific scenarios. Furthermore, we discuss a practical application in the financial area, which confirms the robust performance of these metrics in facilitating model selection among alternative link functions.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"23 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A theory of contrasts for modified Freeman–Tukey statistics and its applications to Tukey’s post-hoc tests for contingency tables 修正弗里曼-图基统计的对比理论及其在或然表图基事后检验中的应用
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-17 DOI: 10.1007/s00180-024-01537-7
Yoshio Takane, Eric J. Beh, Rosaria Lombardo

This paper presents a theory of contrasts designed for modified Freeman–Tukey (FT) statistics which are derived through square-root transformations of observed frequencies (proportions) in contingency tables. Some modifications of the original FT statistic are necessary to allow for ANOVA-like exact decompositions of the global goodness of fit (GOF) measures. The square-root transformations have an important effect of stabilizing (equalizing) variances. The theory is then used to derive Tukey’s post-hoc pairwise comparison tests for contingency tables. Tukey’s tests are more restrictive, but are more powerful, than Scheffè’s post-hoc tests developed earlier for the analysis of contingency tables. Throughout this paper, numerical examples are given to illustrate the theory. Modified FT statistics, like other similar statistics for contingency tables, are based on a large-sample rationale. Small Monte-Carlo studies are conducted to investigate asymptotic (and non-asymptotic) behaviors of the proposed statistics.

本文介绍了一种对比理论,该理论是为修正的弗里曼-图基(FT)统计量而设计的,该统计量是通过对或然率表中的观察频率(比例)进行平方根变换而得出的。为了对全局拟合优度(GOF)进行类似方差分析的精确分解,有必要对原始 FT 统计量进行一些修改。平方根变换具有稳定(均衡)方差的重要作用。然后,利用该理论推导出针对或然表的 Tukey 事后配对比较检验。Tukey 检验比 Scheffè 早先为分析或然率表而开发的事后检验更具限制性,但更强大。本文通篇以数字示例来说明理论。修正的 FT 统计法与其他类似的或然率统计法一样,都是基于大样本的原理。本文进行了小规模的蒙特卡洛研究,以研究拟议统计量的渐近(和非渐近)行为。
{"title":"A theory of contrasts for modified Freeman–Tukey statistics and its applications to Tukey’s post-hoc tests for contingency tables","authors":"Yoshio Takane, Eric J. Beh, Rosaria Lombardo","doi":"10.1007/s00180-024-01537-7","DOIUrl":"https://doi.org/10.1007/s00180-024-01537-7","url":null,"abstract":"<p>This paper presents a theory of contrasts designed for modified Freeman–Tukey (FT) statistics which are derived through square-root transformations of observed frequencies (proportions) in contingency tables. Some modifications of the original FT statistic are necessary to allow for ANOVA-like exact decompositions of the global goodness of fit (GOF) measures. The square-root transformations have an important effect of stabilizing (equalizing) variances. The theory is then used to derive Tukey’s post-hoc pairwise comparison tests for contingency tables. Tukey’s tests are more restrictive, but are more powerful, than Scheffè’s post-hoc tests developed earlier for the analysis of contingency tables. Throughout this paper, numerical examples are given to illustrate the theory. Modified FT statistics, like other similar statistics for contingency tables, are based on a large-sample rationale. Small Monte-Carlo studies are conducted to investigate asymptotic (and non-asymptotic) behaviors of the proposed statistics.\u0000</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"32 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel nonconvex, smooth-at-origin penalty for statistical learning 用于统计学习的新型非凸、平滑原点罚则
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-07 DOI: 10.1007/s00180-024-01525-x
Majnu John, Sujit Vettam, Yihren Wu

Nonconvex penalties are utilized for regularization in high-dimensional statistical learning algorithms primarily because they yield unbiased or nearly unbiased estimators for the parameters in the model. Nonconvex penalties existing in the literature such as SCAD, MCP, Laplace and arctan have a singularity at origin which makes them useful also for variable selection. However, in several high-dimensional frameworks such as deep learning, variable selection is less of a concern. In this paper, we present a nonconvex penalty which is smooth at origin. The paper includes asymptotic results for ordinary least squares estimators regularized with the new penalty function, showing asymptotic bias that vanishes exponentially fast. We also conducted simulations to better understand the finite sample properties and conducted an empirical study employing deep neural network architecture on three datasets and convolutional neural network on four datasets. The empirical study based on artificial neural networks showed better performance for the new regularization approach in five out of the seven datasets.

在高维统计学习算法中,非凸惩罚被用于正则化,主要是因为它们能为模型中的参数提供无偏或接近无偏的估计值。文献中现有的非凸惩罚,如 SCAD、MCP、Laplace 和 arctan,在原点处都有一个奇点,这使它们也适用于变量选择。然而,在深度学习等一些高维框架中,变量选择就不那么重要了。在本文中,我们提出了一种在原点处平滑的非凸罚分。本文包括用新惩罚函数正则化的普通最小二乘估计器的渐近结果,显示了以指数速度消失的渐近偏差。我们还进行了模拟以更好地理解有限样本特性,并在三个数据集上使用深度神经网络架构进行了实证研究,在四个数据集上使用卷积神经网络进行了实证研究。基于人工神经网络的实证研究表明,在七个数据集中,有五个数据集的新正则化方法性能更好。
{"title":"A novel nonconvex, smooth-at-origin penalty for statistical learning","authors":"Majnu John, Sujit Vettam, Yihren Wu","doi":"10.1007/s00180-024-01525-x","DOIUrl":"https://doi.org/10.1007/s00180-024-01525-x","url":null,"abstract":"<p>Nonconvex penalties are utilized for regularization in high-dimensional statistical learning algorithms primarily because they yield unbiased or nearly unbiased estimators for the parameters in the model. Nonconvex penalties existing in the literature such as SCAD, MCP, Laplace and arctan have a singularity at origin which makes them useful also for variable selection. However, in several high-dimensional frameworks such as deep learning, variable selection is less of a concern. In this paper, we present a nonconvex penalty which is smooth at origin. The paper includes asymptotic results for ordinary least squares estimators regularized with the new penalty function, showing asymptotic bias that vanishes exponentially fast. We also conducted simulations to better understand the finite sample properties and conducted an empirical study employing deep neural network architecture on three datasets and convolutional neural network on four datasets. The empirical study based on artificial neural networks showed better performance for the new regularization approach in five out of the seven datasets.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"4 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantinar: a blockchain peer-to-peer ecosystem for modern data analytics Quantinar:用于现代数据分析的区块链点对点生态系统
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-06 DOI: 10.1007/s00180-024-01529-7
Raul Bag, Bruno Spilak, Julian Winkel, Wolfgang Karl Härdle

The power of data and correct statistical analysis has never been more prevalent. Academics and practitioners require nowadays an accurate application of quantitative methods. Yet many branches are subject to a crisis of integrity, which is shown in an improper use of statistical models, p-hacking, HARKing, or failure to replicate results. We propose the use of a Peer-to-Peer (P2P) ecosystem based on a blockchain network, Quantinar, to support quantitative analytics knowledge paired with code in the form of Quantlets or software snippets. The integration of blockchain technology allows Quantinar to ensure fully transparent and reproducible scientific research.

数据和正确统计分析的力量从未像今天这样强大。如今,学术界和从业人员都需要准确地应用定量方法。然而,许多学科都面临着诚信危机,具体表现为统计模型使用不当、P 黑客、HARKing 或无法复制结果。我们建议使用一个基于区块链网络的点对点(P2P)生态系统--Quantinar,以支持量化分析知识与 Quantlets 或软件片段形式的代码配对。通过整合区块链技术,Quantinar 可以确保科学研究完全透明、可重复。
{"title":"Quantinar: a blockchain peer-to-peer ecosystem for modern data analytics","authors":"Raul Bag, Bruno Spilak, Julian Winkel, Wolfgang Karl Härdle","doi":"10.1007/s00180-024-01529-7","DOIUrl":"https://doi.org/10.1007/s00180-024-01529-7","url":null,"abstract":"<p>The power of data and correct statistical analysis has never been more prevalent. Academics and practitioners require nowadays an accurate application of quantitative methods. Yet many branches are subject to a crisis of integrity, which is shown in an improper use of statistical models, <i>p</i>-hacking, HARKing, or failure to replicate results. We propose the use of a Peer-to-Peer (P2P) ecosystem based on a blockchain network, Quantinar, to support quantitative analytics knowledge paired with code in the form of Quantlets or software snippets. The integration of blockchain technology allows Quantinar to ensure fully transparent and reproducible scientific research.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"142 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BARMPy: Bayesian additive regression models Python package BARMPy:贝叶斯加性回归模型 Python 软件包
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-04 DOI: 10.1007/s00180-024-01535-9
Danielle Van Boxel

We make Bayesian additive regression networks (BARN) available as a Python package, barmpy, with documentation at https://dvbuntu.github.io/barmpy/ for general machine learning practitioners. Our object-oriented design is compatible with SciKit-Learn, allowing usage of their tools like cross-validation. To ease learning to use barmpy, we produce a companion tutorial that expands on reference information in the documentation. Any interested user can pip install barmpy from the official PyPi repository. barmpy also serves as a baseline Python library for generic Bayesian additive regression models.

我们将贝叶斯加性回归网络(BARN)作为一个 Python 软件包(barmpy)提供给广大机器学习从业者,其文档请访问 https://dvbuntu.github.io/barmpy/。我们面向对象的设计与 SciKit-Learn 兼容,允许使用交叉验证等工具。为了方便学习使用 barmpy,我们编写了配套教程,对文档中的参考信息进行了扩展。任何感兴趣的用户都可以从官方 PyPi 代码库中 pip 安装 barmpy。barmpy 还是通用贝叶斯加法回归模型的 Python 基线库。
{"title":"BARMPy: Bayesian additive regression models Python package","authors":"Danielle Van Boxel","doi":"10.1007/s00180-024-01535-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01535-9","url":null,"abstract":"<p>We make Bayesian additive regression networks (BARN) available as a Python package, <span>barmpy</span>, with documentation at https://dvbuntu.github.io/barmpy/ for general machine learning practitioners. Our object-oriented design is compatible with SciKit-Learn, allowing usage of their tools like cross-validation. To ease learning to use <span>barmpy</span>, we produce a companion tutorial that expands on reference information in the documentation. Any interested user can <span>pip install barmpy</span> from the official PyPi repository. <span>barmpy</span> also serves as a baseline Python library for generic Bayesian additive regression models.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"55 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust confidence intervals for meta-regression with interaction effects 具有交互效应的元回归的稳健置信区间
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-02 DOI: 10.1007/s00180-024-01530-0
Maria Thurow, Thilo Welz, Eric Knop, Tim Friede, Markus Pauly

Meta-analysis is an important statistical technique for synthesizing the results of multiple studies regarding the same or closely related research question. So-called meta-regression extends meta-analysis models by accounting for study-level covariates. Mixed-effects meta-regression models provide a powerful tool for evidence synthesis, by appropriately accounting for between-study heterogeneity. In fact, modelling the study effect in terms of random effects and moderators not only allows to examine the impact of the moderators, but often leads to more accurate estimates of the involved parameters. Nevertheless, due to the often small number of studies on a specific research topic, interactions are often neglected in meta-regression. In this work we consider the research questions (i) how moderator interactions influence inference in mixed-effects meta-regression models and (ii) whether some inference methods are more reliable than others. Here we review robust methods for confidence intervals in meta-regression models including interaction effects. These methods are based on the application of robust sandwich estimators of Hartung-Knapp-Sidik-Jonkman (HKSJ) or heteroscedasticity-consistent (HC)-type for estimating the variance-covariance matrix of the vector of model coefficients. Furthermore, we compare different versions of these robust estimators in an extensive simulation study. We thereby investigate coverage and width of seven different confidence intervals under varying conditions. Our simulation study shows that the coverage rates as well as the interval widths of the parameter estimates are only slightly affected by adjustment of the parameters. It also turned out that using the Satterthwaite approximation for the degrees of freedom seems to be advantageous for accurate coverage rates. In addition, different to previous analyses for simpler models, the (textbf{HKSJ})-estimator shows a worse performance in this more complex setting compared to some of the (textbf{HC})-estimators.

荟萃分析是一种重要的统计技术,用于综合有关相同或密切相关研究问题的多项研究结果。所谓的元回归通过考虑研究层面的协变量来扩展元分析模型。混合效应元回归模型通过适当考虑研究间的异质性,为证据综合提供了强有力的工具。事实上,用随机效应和调节因子来模拟研究效应,不仅可以考察调节因子的影响,而且往往能更准确地估计相关参数。然而,由于特定研究课题的研究数量通常较少,元回归往往忽略了交互作用。在这项工作中,我们考虑了以下研究问题:(i) 在混合效应元回归模型中,调节因子的交互作用如何影响推断;(ii) 某些推断方法是否比其他方法更可靠。在此,我们回顾了元回归模型(包括交互效应)中置信区间的稳健方法。这些方法的基础是应用 Hartung-Knapp-Sidik-Jonkman(HKSJ)或异方差一致(HC)型稳健三明治估计器来估计模型系数向量的方差-协方差矩阵。此外,我们还在广泛的模拟研究中比较了这些稳健估计器的不同版本。因此,我们研究了不同条件下七个不同置信区间的覆盖率和宽度。我们的模拟研究表明,参数估计的覆盖率和区间宽度只受到参数调整的轻微影响。结果还表明,使用萨特斯韦特自由度近似值似乎更有利于获得准确的覆盖率。此外,与之前对较简单模型的分析不同,在这种较复杂的情况下,与某些(textbf{HC})估计器相比,(textbf{HKSJ})估计器的性能较差。
{"title":"Robust confidence intervals for meta-regression with interaction effects","authors":"Maria Thurow, Thilo Welz, Eric Knop, Tim Friede, Markus Pauly","doi":"10.1007/s00180-024-01530-0","DOIUrl":"https://doi.org/10.1007/s00180-024-01530-0","url":null,"abstract":"<p>Meta-analysis is an important statistical technique for synthesizing the results of multiple studies regarding the same or closely related research question. So-called meta-regression extends meta-analysis models by accounting for study-level covariates. Mixed-effects meta-regression models provide a powerful tool for evidence synthesis, by appropriately accounting for between-study heterogeneity. In fact, modelling the study effect in terms of random effects and moderators not only allows to examine the impact of the moderators, but often leads to more accurate estimates of the involved parameters. Nevertheless, due to the often small number of studies on a specific research topic, interactions are often neglected in meta-regression. In this work we consider the research questions (i) how moderator interactions influence inference in mixed-effects meta-regression models and (ii) whether some inference methods are more reliable than others. Here we review robust methods for confidence intervals in meta-regression models including interaction effects. These methods are based on the application of robust sandwich estimators of Hartung-Knapp-Sidik-Jonkman (<b>HKSJ</b>) or heteroscedasticity-consistent (<b>HC</b>)-type for estimating the variance-covariance matrix of the vector of model coefficients. Furthermore, we compare different versions of these robust estimators in an extensive simulation study. We thereby investigate coverage and width of seven different confidence intervals under varying conditions. Our simulation study shows that the coverage rates as well as the interval widths of the parameter estimates are only slightly affected by adjustment of the parameters. It also turned out that using the Satterthwaite approximation for the degrees of freedom seems to be advantageous for accurate coverage rates. In addition, different to previous analyses for simpler models, the <span>(textbf{HKSJ})</span>-estimator shows a worse performance in this more complex setting compared to some of the <span>(textbf{HC})</span>-estimators.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"182 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141884981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ordinal causal discovery based on Markov blankets 基于马尔可夫毛毯的序数因果发现
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-30 DOI: 10.1007/s00180-024-01513-1
Yu Du, Yi Sun, Luyao Tan

This work focuses on learning causal network structures from ordinal categorical data. By combining constraint-based with score-and-search methodologies in structural learning, we propose a hybrid method called Markov Blanket Based Ordinal Causal Discovery (MBOCD) algorithm, which can capture the ordinal relationship of values in ordinal categorical variables. Theoretically, it is proved that for ordinal causal networks, two adjacent DAGs belonging to the same Markov equivalence class are identifiable, which results in the generation of a causal graph. Simulation experiments demonstrate that the proposed algorithm outperforms existing methods in terms of computational efficiency and accuracy. The code of this work is open at: https://github.com/leoydu/MBOCDcode.git.

这项研究的重点是从顺序分类数据中学习因果网络结构。通过将结构学习中的基于约束的方法与基于分数和搜索的方法相结合,我们提出了一种称为基于马尔可夫空白的序因果发现(MBOCD)算法的混合方法,它可以捕捉序分类变量中值的序关系。理论证明,对于顺序因果网络,属于同一马尔可夫等价类的两个相邻 DAG 是可识别的,从而生成因果图。仿真实验证明,所提出的算法在计算效率和准确性方面都优于现有方法。这项工作的代码公开于:https://github.com/leoydu/MBOCDcode.git。
{"title":"Ordinal causal discovery based on Markov blankets","authors":"Yu Du, Yi Sun, Luyao Tan","doi":"10.1007/s00180-024-01513-1","DOIUrl":"https://doi.org/10.1007/s00180-024-01513-1","url":null,"abstract":"<p>This work focuses on learning causal network structures from ordinal categorical data. By combining constraint-based with score-and-search methodologies in structural learning, we propose a hybrid method called Markov Blanket Based Ordinal Causal Discovery (MBOCD) algorithm, which can capture the ordinal relationship of values in ordinal categorical variables. Theoretically, it is proved that for ordinal causal networks, two adjacent DAGs belonging to the same Markov equivalence class are identifiable, which results in the generation of a causal graph. Simulation experiments demonstrate that the proposed algorithm outperforms existing methods in terms of computational efficiency and accuracy. The code of this work is open at: https://github.com/leoydu/MBOCDcode.git.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"262 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Metropolis–Hastings Robbins–Monro algorithm via variational inference for estimating the multidimensional graded response model: a calculationally efficient estimation scheme to deal with complex test structures 通过变分推理估算多维分级响应模型的 Metropolis-Hastings Robbins-Monro 算法:处理复杂测试结构的高效计算估算方案
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-29 DOI: 10.1007/s00180-024-01533-x
Xue Wang, Jing Lu, Jiwei Zhang

This paper introduces the Metropolis–Hastings variational inference Robbins–Monro (MHVIRM) algorithm, a modification of the Metropolis–Hastings Robbins–Monro (MHRM) method, designed for estimating parameters in complex multidimensional graded response models (MGRM). By integrating a black-box variational inference (BBVI) approach, MHVIRM enhances computational efficiency and estimation accuracy, particularly for models with high-dimensional data and complex test structures. The algorithms effectiveness is demonstrated through simulations, showing improved precision over traditional MHRM, especially in scenarios with complex structures and small sample sizes. Moreover, MHVIRM is robust to initial values. The applicability is further illustrated with a real dataset analysis.

本文介绍了 Metropolis-Hastings 变分推理 Robbins-Monro 算法(MHVIRM),它是 Metropolis-Hastings Robbins-Monro 方法(MHRM)的改进版,专为估计复杂多维分级响应模型(MGRM)中的参数而设计。通过整合黑箱变分推理(BBVI)方法,MHVIRM 提高了计算效率和估算精度,尤其适用于具有高维数据和复杂测试结构的模型。该算法通过仿真证明了其有效性,与传统的 MHRM 相比,精度有所提高,尤其是在结构复杂和样本量较小的情况下。此外,MHVIRM 对初始值具有鲁棒性。实际数据集分析进一步说明了该算法的适用性。
{"title":"A Metropolis–Hastings Robbins–Monro algorithm via variational inference for estimating the multidimensional graded response model: a calculationally efficient estimation scheme to deal with complex test structures","authors":"Xue Wang, Jing Lu, Jiwei Zhang","doi":"10.1007/s00180-024-01533-x","DOIUrl":"https://doi.org/10.1007/s00180-024-01533-x","url":null,"abstract":"<p>This paper introduces the Metropolis–Hastings variational inference Robbins–Monro (MHVIRM) algorithm, a modification of the Metropolis–Hastings Robbins–Monro (MHRM) method, designed for estimating parameters in complex multidimensional graded response models (MGRM). By integrating a black-box variational inference (BBVI) approach, MHVIRM enhances computational efficiency and estimation accuracy, particularly for models with high-dimensional data and complex test structures. The algorithms effectiveness is demonstrated through simulations, showing improved precision over traditional MHRM, especially in scenarios with complex structures and small sample sizes. Moreover, MHVIRM is robust to initial values. The applicability is further illustrated with a real dataset analysis.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"41 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1