首页 > 最新文献

Journal of the Royal Statistical Society Series B-Statistical Methodology最新文献

英文 中文
GRASP: a goodness-of-fit test for classification learning GRASP:分类学习的拟合优度检验
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-09-23 DOI: 10.1093/jrsssb/qkad106
Adel Javanmard, Mohammad Mehrabi
Abstract Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterising the fit of the model to the underlying conditional law of labels given the features vector (Y∣X), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law Y∣X and treats that as a black-box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form H0:E[Df(Bern(η(X))‖Bern(η^(X)))]≤τ where Df represents an f-divergence function, and η(x), η^(x), respectively, denote the true and an estimate likelihood for a feature vector x admitting a positive label. We propose a novel test, called Goodness-of-fit with Randomisation and Scoring Procedure (GRASP) for testing H0, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X GRASP designed for model-X settings where the joint distribution of the features vector is known. Model-X GRASP uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.
摘要分类器的性能通常以测试数据的平均准确率来衡量。尽管是一种标准度量,但平均精度在描述模型与给定特征向量(Y∣X)的潜在标签条件律的拟合方面失败,例如由于模型规格错误,过度拟合和高维。在本文中,我们考虑了评估一般二分类器的拟合优度的基本问题。我们的框架没有对条件律Y∣X做任何参数假设,并将其视为只能通过查询访问的黑盒oracle模型。我们将拟合优良度评估问题表述为形式为H0的容差假设检验:E[Df(Bern(η(X))‖Bern(η^(X)))]≤τ,其中Df表示f-散度函数,η(X), η^(X)分别表示承认正标签的特征向量X的真似然和估计似然。我们提出了一种新的测试,称为随机化和评分程序(GRASP)的拟合优度测试,用于测试H0,它适用于有限样本设置,无论特征(无分布)如何。我们还提出了针对已知特征向量联合分布的model-X设置设计的model-X GRASP。Model-X GRASP利用这种分布信息来获得更好的动力。我们通过大量的数值实验来评估测试的性能。
{"title":"GRASP: a goodness-of-fit test for classification learning","authors":"Adel Javanmard, Mohammad Mehrabi","doi":"10.1093/jrsssb/qkad106","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad106","url":null,"abstract":"Abstract Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterising the fit of the model to the underlying conditional law of labels given the features vector (Y∣X), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law Y∣X and treats that as a black-box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form H0:E[Df(Bern(η(X))‖Bern(η^(X)))]≤τ where Df represents an f-divergence function, and η(x), η^(x), respectively, denote the true and an estimate likelihood for a feature vector x admitting a positive label. We propose a novel test, called Goodness-of-fit with Randomisation and Scoring Procedure (GRASP) for testing H0, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X GRASP designed for model-X settings where the joint distribution of the features vector is known. Model-X GRASP uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135957976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial confidence regions for combinations of excursion sets in image analysis 图像分析中偏移集组合的空间置信区域
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-09-21 DOI: 10.1093/jrsssb/qkad104
Thomas Maullin-Sapey, Armin Schwartzman, Thomas E Nichols
Abstract The analysis of excursion sets in imaging data is essential to a wide range of scientific disciplines such as neuroimaging, climatology, and cosmology. Despite growing literature, there is little published concerning the comparison of processes that have been sampled across the same spatial region but which reflect different study conditions. Given a set of asymptotically Gaussian random fields, each corresponding to a sample acquired for a different study condition, this work aims to provide confidence statements about the intersection, or union, of the excursion sets across all fields. Such spatial regions are of natural interest as they directly correspond to the questions ‘Where do all random fields exceed a predetermined threshold?’, or ‘Where does at least one random field exceed a predetermined threshold?’. To assess the degree of spatial variability present, our method provides, with a desired confidence, subsets and supersets of spatial regions defined by logical conjunctions (i.e. set intersections) or disjunctions (i.e. set unions), without any assumption on the dependence between the different fields. The method is verified by extensive simulations and demonstrated using task-fMRI data to identify brain regions with activation common to four variants of a working memory task.
成像数据中的偏移集分析对于神经影像学、气候学和宇宙学等广泛的科学学科至关重要。尽管文献越来越多,但很少有关于在同一空间区域取样但反映不同研究条件的过程的比较的出版物。给定一组渐近高斯随机场,每个随机场对应于为不同研究条件获得的样本,本工作旨在提供关于所有领域的偏移集的交集或并集的置信度陈述。这样的空间区域具有天然的趣味性,因为它们直接对应于“所有的随机区域在哪里超过预定的阈值?”或“至少一个随机场在哪里超过预定的阈值?”。为了评估存在的空间变异性程度,我们的方法以期望的置信度提供了由逻辑连接(即集合交叉点)或分离(即集合联合)定义的空间区域的子集和超集,而不需要对不同领域之间的依赖性进行任何假设。该方法通过大量的模拟验证,并使用任务-功能磁共振成像数据来识别工作记忆任务的四种变体共同激活的大脑区域。
{"title":"Spatial confidence regions for combinations of excursion sets in image analysis","authors":"Thomas Maullin-Sapey, Armin Schwartzman, Thomas E Nichols","doi":"10.1093/jrsssb/qkad104","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad104","url":null,"abstract":"Abstract The analysis of excursion sets in imaging data is essential to a wide range of scientific disciplines such as neuroimaging, climatology, and cosmology. Despite growing literature, there is little published concerning the comparison of processes that have been sampled across the same spatial region but which reflect different study conditions. Given a set of asymptotically Gaussian random fields, each corresponding to a sample acquired for a different study condition, this work aims to provide confidence statements about the intersection, or union, of the excursion sets across all fields. Such spatial regions are of natural interest as they directly correspond to the questions ‘Where do all random fields exceed a predetermined threshold?’, or ‘Where does at least one random field exceed a predetermined threshold?’. To assess the degree of spatial variability present, our method provides, with a desired confidence, subsets and supersets of spatial regions defined by logical conjunctions (i.e. set intersections) or disjunctions (i.e. set unions), without any assumption on the dependence between the different fields. The method is verified by extensive simulations and demonstrated using task-fMRI data to identify brain regions with activation common to four variants of a working memory task.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136238528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Empirical bias-reducing adjustments to estimating functions 经验偏差减少调整估计函数
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-09-16 DOI: 10.1093/jrsssb/qkad083
Ioannis Kosmidis, Nicola Lunardon
Abstract We develop a novel, general framework for reduced-bias M-estimation from asymptotically unbiased estimating functions. The framework relies on an empirical approximation of the bias by a function of derivatives of estimating function contributions. Reduced-bias M-estimation operates either implicitly, solving empirically adjusted estimating equations, or explicitly, subtracting the estimated bias from the original M-estimates, and applies to partially or fully specified models with likelihoods or surrogate objectives. Automatic differentiation can abstract away the algebra required to implement reduced-bias M-estimation. As a result, the bias-reduction methods, we introduce have broader applicability, straightforward implementation, and less algebraic or computational effort than other established bias-reduction methods that require resampling or expectations of products of log-likelihood derivatives. If M-estimation is by maximising an objective, then there always exists a bias-reducing penalised objective. That penalised objective relates to information criteria for model selection and can be enhanced with plug-in penalties to deliver reduced-bias M-estimates with extra properties, like finiteness for categorical data models. Inferential procedures and model selection procedures for M-estimators apply unaltered with the reduced-bias M-estimates. We demonstrate and assess the properties of reduced-bias M-estimation in well-used, prominent modelling settings of varying complexity.
基于渐近无偏估计函数,提出了一种新的、通用的减偏m估计框架。该框架依赖于通过估计函数贡献的导数函数对偏差的经验逼近。减少偏差m估计可以隐式地解决经验调整的估计方程,也可以显式地从原始m估计中减去估计偏差,并适用于具有可能性或替代目标的部分或完全指定的模型。自动微分可以抽象出实现减偏m估计所需的代数。因此,我们介绍的偏倚减少方法具有更广泛的适用性,直接实现,并且比其他需要重采样或对数似然导数乘积期望的既定偏倚减少方法的代数或计算工作量更少。如果m估计是通过最大化一个目标,那么总是存在一个减少偏差的惩罚目标。这个被惩罚的目标与模型选择的信息标准有关,并且可以通过插件惩罚来增强,以提供具有额外属性的减少偏差的m估计,例如分类数据模型的有限性。m估计器的推理程序和模型选择程序不变地适用于减少偏差的m估计。我们在不同复杂性的良好使用的突出建模设置中演示和评估减偏m估计的性质。
{"title":"Empirical bias-reducing adjustments to estimating functions","authors":"Ioannis Kosmidis, Nicola Lunardon","doi":"10.1093/jrsssb/qkad083","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad083","url":null,"abstract":"Abstract We develop a novel, general framework for reduced-bias M-estimation from asymptotically unbiased estimating functions. The framework relies on an empirical approximation of the bias by a function of derivatives of estimating function contributions. Reduced-bias M-estimation operates either implicitly, solving empirically adjusted estimating equations, or explicitly, subtracting the estimated bias from the original M-estimates, and applies to partially or fully specified models with likelihoods or surrogate objectives. Automatic differentiation can abstract away the algebra required to implement reduced-bias M-estimation. As a result, the bias-reduction methods, we introduce have broader applicability, straightforward implementation, and less algebraic or computational effort than other established bias-reduction methods that require resampling or expectations of products of log-likelihood derivatives. If M-estimation is by maximising an objective, then there always exists a bias-reducing penalised objective. That penalised objective relates to information criteria for model selection and can be enhanced with plug-in penalties to deliver reduced-bias M-estimates with extra properties, like finiteness for categorical data models. Inferential procedures and model selection procedures for M-estimators apply unaltered with the reduced-bias M-estimates. We demonstrate and assess the properties of reduced-bias M-estimation in well-used, prominent modelling settings of varying complexity.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135304899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels 度校正和相关随机块模型的蒙特卡罗拟合优度检验
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-09-15 DOI: 10.1093/jrsssb/qkad084
Vishesh Karwa, Debdeep Pati, Sonja Petrović, Liam Solus, Nikita Alexeev, Mateja Raič, Dane Wilburne, Robert Williams, Bowei Yan
Abstract We construct Bayesian and frequentist finite-sample goodness-of-fit tests for three different variants of the stochastic blockmodel for network data. Since all of the stochastic blockmodel variants are log-linear in form when block assignments are known, the tests for the latent block model versions combine a block membership estimator with the algebraic statistics machinery for testing goodness-of-fit in log-linear models. We describe Markov bases and marginal polytopes of the variants of the stochastic blockmodel and discuss how both facilitate the development of goodness-of-fit tests and understanding of model behaviour. The general testing methodology developed here extends to any finite mixture of log-linear models on discrete data, and as such is the first application of the algebraic statistics machinery for latent-variable models.
摘要本文对网络数据随机块模型的三种不同变体构造了贝叶斯和频率有限样本拟合优度检验。由于当块分配已知时,所有随机块模型变体的形式都是对数线性的,因此对潜在块模型版本的测试将块隶属度估计器与代数统计机制结合起来,用于测试对数线性模型的拟合优度。我们描述了随机块模型变体的马尔可夫基和边际多面体,并讨论了它们如何促进拟合优度检验的发展和对模型行为的理解。这里开发的一般测试方法扩展到离散数据上的任何对数线性模型的有限混合,因此是潜在变量模型的代数统计机制的第一个应用。
{"title":"Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels","authors":"Vishesh Karwa, Debdeep Pati, Sonja Petrović, Liam Solus, Nikita Alexeev, Mateja Raič, Dane Wilburne, Robert Williams, Bowei Yan","doi":"10.1093/jrsssb/qkad084","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad084","url":null,"abstract":"Abstract We construct Bayesian and frequentist finite-sample goodness-of-fit tests for three different variants of the stochastic blockmodel for network data. Since all of the stochastic blockmodel variants are log-linear in form when block assignments are known, the tests for the latent block model versions combine a block membership estimator with the algebraic statistics machinery for testing goodness-of-fit in log-linear models. We describe Markov bases and marginal polytopes of the variants of the stochastic blockmodel and discuss how both facilitate the development of goodness-of-fit tests and understanding of model behaviour. The general testing methodology developed here extends to any finite mixture of log-linear models on discrete data, and as such is the first application of the algebraic statistics machinery for latent-variable models.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135394666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Holdout predictive checks for Bayesian model criticism 拒绝对贝叶斯模型批评的预测检查
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-09-15 DOI: 10.1093/jrsssb/qkad105
Gemma E Moran, David M Blei, Rajesh Ranganath
Abstract Bayesian modelling helps applied researchers to articulate assumptions about their data and develop models tailored for specific applications. Thanks to good methods for approximate posterior inference, researchers can now easily build, use, and revise complicated Bayesian models for large and rich data. These capabilities, however, bring into focus the problem of model criticism. Researchers need tools to diagnose the fitness of their models, to understand where they fall short, and to guide their revision. In this paper, we develop a new method for Bayesian model criticism, the holdout predictive check (HPC). Holdout predictive check are built on posterior predictive check (PPC), a seminal method that checks a model by assessing the posterior predictive distribution on the observed data. However, PPC use the data twice—both to calculate the posterior predictive and to evaluate it—which can lead to uncalibrated p-values. Holdout predictive check, in contrast, compare the posterior predictive distribution to a draw from the population distribution, a heldout dataset. This method blends Bayesian modelling with frequentist assessment. Unlike the PPC, we prove that the HPC is properly calibrated. Empirically, we study HPC on classical regression, a hierarchical model of text data, and factor analysis.
贝叶斯建模帮助应用研究人员清晰地表达关于他们的数据的假设,并开发适合特定应用的模型。由于近似后验推理的良好方法,研究人员现在可以轻松地为大型和丰富的数据构建,使用和修改复杂的贝叶斯模型。然而,这些能力引起了模型批评问题的关注。研究人员需要工具来诊断他们的模型是否适合,了解他们的不足之处,并指导他们的修正。本文提出了一种新的贝叶斯模型批评方法——滞留预测检验(HPC)。Holdout预测检验建立在后验预测检验(PPC)的基础上,后验预测检验是一种通过评估观测数据的后验预测分布来检验模型的开创性方法。然而,PPC使用数据两次——计算后验预测和评估它——这可能导致未校准的p值。相比之下,Holdout预测检验将后验预测分布与总体分布(Holdout数据集)的抽取结果进行比较。该方法将贝叶斯建模与频率评估相结合。与PPC不同,我们证明了HPC是正确校准的。在实证方面,我们通过经典回归、文本数据的层次模型和因子分析来研究HPC。
{"title":"Holdout predictive checks for Bayesian model criticism","authors":"Gemma E Moran, David M Blei, Rajesh Ranganath","doi":"10.1093/jrsssb/qkad105","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad105","url":null,"abstract":"Abstract Bayesian modelling helps applied researchers to articulate assumptions about their data and develop models tailored for specific applications. Thanks to good methods for approximate posterior inference, researchers can now easily build, use, and revise complicated Bayesian models for large and rich data. These capabilities, however, bring into focus the problem of model criticism. Researchers need tools to diagnose the fitness of their models, to understand where they fall short, and to guide their revision. In this paper, we develop a new method for Bayesian model criticism, the holdout predictive check (HPC). Holdout predictive check are built on posterior predictive check (PPC), a seminal method that checks a model by assessing the posterior predictive distribution on the observed data. However, PPC use the data twice—both to calculate the posterior predictive and to evaluate it—which can lead to uncalibrated p-values. Holdout predictive check, in contrast, compare the posterior predictive distribution to a draw from the population distribution, a heldout dataset. This method blends Bayesian modelling with frequentist assessment. Unlike the PPC, we prove that the HPC is properly calibrated. Empirically, we study HPC on classical regression, a hierarchical model of text data, and factor analysis.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135394458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
David Huk, Lorenzo Pacchiardi, Ritabrata Dutta and Mark Steel’s contribution to the Discussion of “Martingale Posterior Distributions” by Fong, Holmes and Walker David Huk, Lorenzo Pacchiardi, Ritabrata Dutta和Mark Steel对Fong, Holmes和Walker的“鞅后验分布”讨论的贡献
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-09-14 DOI: 10.1093/jrsssb/qkad094
David Huk, Lorenzo Pacchiardi, Ritabrata Dutta, Mark Steel
{"title":"David Huk, Lorenzo Pacchiardi, Ritabrata Dutta and Mark Steel’s contribution to the Discussion of “Martingale Posterior Distributions” by Fong, Holmes and Walker","authors":"David Huk, Lorenzo Pacchiardi, Ritabrata Dutta, Mark Steel","doi":"10.1093/jrsssb/qkad094","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad094","url":null,"abstract":"","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135552136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Semi-supervised approaches to efficient evaluation of model prediction performance 修正:有效评估模型预测性能的半监督方法
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-09-14 DOI: 10.1093/jrsssb/qkad107
{"title":"Correction to: Semi-supervised approaches to efficient evaluation of model prediction performance","authors":"","doi":"10.1093/jrsssb/qkad107","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad107","url":null,"abstract":"","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135552383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A fast asynchronous Markov chain Monte Carlo sampler for sparse Bayesian inference 稀疏贝叶斯推理的快速异步马尔可夫链蒙特卡罗采样器
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-09-13 DOI: 10.1093/jrsssb/qkad078
Yves Atchadé, Liwei Wang
Abstract We propose a very fast approximate Markov chain Monte Carlo sampling framework that is applicable to a large class of sparse Bayesian inference problems. The computational cost per iteration in several regression models is of order O(n(s+J)), where n is the sample size, s is the underlying sparsity of the model, and J is the size of a randomly selected subset of regressors. This cost can be further reduced by data sub-sampling when stochastic gradient Langevin dynamics are employed. The algorithm is an extension of the asynchronous Gibbs sampler of Johnson et al. [(2013). Analyzing Hogwild parallel Gaussian Gibbs sampling. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS’13) (Vol. 2, pp. 2715–2723)], but can be viewed from a statistical perspective as a form of Bayesian iterated sure independent screening [Fan, J., Samworth, R., & Wu, Y. (2009). Ultrahigh dimensional feature selection: Beyond the linear model. Journal of Machine Learning Research, 10, 2013–2038]. We show that in high-dimensional linear regression problems, the Markov chain generated by the proposed algorithm admits an invariant distribution that recovers correctly the main signal with high probability under some statistical assumptions. Furthermore, we show that its mixing time is at most linear in the number of regressors. We illustrate the algorithm with several models.
摘要提出了一种非常快速的近似马尔可夫链蒙特卡罗采样框架,该框架适用于一类稀疏贝叶斯推理问题。几种回归模型的每次迭代计算代价为O(n(s+J))阶,其中n为样本量,s为模型的底层稀疏度,J为随机选择的回归量子集的大小。当采用随机梯度朗之万动态时,可以通过数据子采样进一步降低这一代价。该算法是Johnson等人[(2013)]的异步Gibbs采样器的扩展。霍格威尔德平行高斯吉布斯抽样分析。在第26届国际神经信息处理系统会议论文集(NIPS ' 13)(第2卷,第2715-2723页)中,但可以从统计角度视为贝叶斯迭代确定独立筛选的一种形式[Fan, J., Samworth, R., &吴艳(2009)。超高维特征选择:超越线性模型。机器学习研究学报,10,2013-2038。结果表明,在高维线性回归问题中,该算法生成的马尔可夫链在一定的统计假设下,具有高概率正确恢复主信号的不变量分布。进一步,我们证明了它的混合时间在回归量的数量上最多是线性的。我们用几个模型来说明该算法。
{"title":"A fast asynchronous Markov chain Monte Carlo sampler for sparse Bayesian inference","authors":"Yves Atchadé, Liwei Wang","doi":"10.1093/jrsssb/qkad078","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad078","url":null,"abstract":"Abstract We propose a very fast approximate Markov chain Monte Carlo sampling framework that is applicable to a large class of sparse Bayesian inference problems. The computational cost per iteration in several regression models is of order O(n(s+J)), where n is the sample size, s is the underlying sparsity of the model, and J is the size of a randomly selected subset of regressors. This cost can be further reduced by data sub-sampling when stochastic gradient Langevin dynamics are employed. The algorithm is an extension of the asynchronous Gibbs sampler of Johnson et al. [(2013). Analyzing Hogwild parallel Gaussian Gibbs sampling. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS’13) (Vol. 2, pp. 2715–2723)], but can be viewed from a statistical perspective as a form of Bayesian iterated sure independent screening [Fan, J., Samworth, R., & Wu, Y. (2009). Ultrahigh dimensional feature selection: Beyond the linear model. Journal of Machine Learning Research, 10, 2013–2038]. We show that in high-dimensional linear regression problems, the Markov chain generated by the proposed algorithm admits an invariant distribution that recovers correctly the main signal with high probability under some statistical assumptions. Furthermore, we show that its mixing time is at most linear in the number of regressors. We illustrate the algorithm with several models.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135781780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stationary nonseparable space-time covariance functions on networks 网络上的平稳不可分时空协方差函数
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-09-08 DOI: 10.1093/jrsssb/qkad082
Emilio Porcu, Philip A White, Marc G Genton
Abstract The advent of data science has provided an increasing number of challenges with high data complexity. This paper addresses the challenge of space-time data where the spatial domain is not a planar surface, a sphere, or a linear network, but a generalised network (termed a graph with Euclidean edges). Additionally, data are repeatedly measured over different temporal instants. We provide new classes of stationary nonseparable space-time covariance functions where space can be a generalised network, a Euclidean tree, or a linear network, and where time can be linear or circular (seasonal). Because the construction principles are technical, we focus on illustrations that guide the reader through the construction of statistically interpretable examples. A simulation study demonstrates that the correct model can be recovered when compared to misspecified models. In addition, our simulation studies show that we effectively recover simulation parameters. In our data analysis, we consider a traffic accident dataset that shows improved model performance based on covariance specifications and network-based metrics.
数据科学的出现带来了越来越多的高数据复杂性的挑战。本文解决了时空数据的挑战,其中空间域不是平面,球体或线性网络,而是广义网络(称为具有欧几里得边的图)。此外,数据在不同的时间瞬间被反复测量。我们提供了一类新的平稳不可分时空协方差函数,其中空间可以是一个广义网络,欧几里得树或线性网络,其中时间可以是线性或圆形(季节性)。由于构建原则是技术性的,我们将重点放在通过构建统计上可解释的示例来指导读者的插图上。仿真研究表明,与错误的模型相比,正确的模型是可以恢复的。此外,我们的仿真研究表明,我们可以有效地恢复仿真参数。在我们的数据分析中,我们考虑了一个交通事故数据集,该数据集显示了基于协方差规范和基于网络的指标的改进的模型性能。
{"title":"Stationary nonseparable space-time covariance functions on networks","authors":"Emilio Porcu, Philip A White, Marc G Genton","doi":"10.1093/jrsssb/qkad082","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad082","url":null,"abstract":"Abstract The advent of data science has provided an increasing number of challenges with high data complexity. This paper addresses the challenge of space-time data where the spatial domain is not a planar surface, a sphere, or a linear network, but a generalised network (termed a graph with Euclidean edges). Additionally, data are repeatedly measured over different temporal instants. We provide new classes of stationary nonseparable space-time covariance functions where space can be a generalised network, a Euclidean tree, or a linear network, and where time can be linear or circular (seasonal). Because the construction principles are technical, we focus on illustrations that guide the reader through the construction of statistically interpretable examples. A simulation study demonstrates that the correct model can be recovered when compared to misspecified models. In addition, our simulation studies show that we effectively recover simulation parameters. In our data analysis, we consider a traffic accident dataset that shows improved model performance based on covariance specifications and network-based metrics.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136298458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Derandomised knockoffs: leveraging e-values for false discovery rate control 非随机仿冒品:利用e值控制错误发现率
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-09-07 DOI: 10.1093/jrsssb/qkad085
Zhimei Ren, Rina Foygel Barber
Abstract Model-X knockoffs is a flexible wrapper method for high-dimensional regression algorithms, which provides guaranteed control of the false discovery rate (FDR). Due to the randomness inherent to the method, different runs of model-X knockoffs on the same dataset often result in different sets of selected variables, which is undesirable in practice. In this article, we introduce a methodology for derandomising model-X knockoffs with provable FDR control. The key insight of our proposed method lies in the discovery that the knockoffs procedure is in essence an e-BH procedure. We make use of this connection and derandomise model-X knockoffs by aggregating the e-values resulting from multiple knockoff realisations. We prove that the derandomised procedure controls the FDR at the desired level, without any additional conditions (in contrast, previously proposed methods for derandomisation are not able to guarantee FDR control). The proposed method is evaluated with numerical experiments, where we find that the derandomised procedure achieves comparable power and dramatically decreased selection variability when compared with model-X knockoffs.
Model-X仿制品是一种灵活的高维回归算法包装方法,为控制错误发现率(FDR)提供了保证。由于该方法固有的随机性,在同一数据集上运行不同的模型x仿制品通常会产生不同的选择变量集,这在实践中是不希望的。在本文中,我们介绍了一种具有可证明的FDR控制的去随机化模型x仿制品的方法。我们提出的方法的关键见解在于发现仿制程序本质上是一个e-BH程序。我们利用这种联系,并通过汇总多个仿冒实现产生的e值来消除模型x仿冒的随机性。我们证明,在没有任何附加条件的情况下,非随机化过程将FDR控制在期望的水平(相反,先前提出的非随机化方法不能保证FDR控制)。通过数值实验对所提出的方法进行了评估,我们发现,与模型x仿制品相比,非随机程序实现了相当的功率,并显着降低了选择变异性。
{"title":"Derandomised knockoffs: leveraging <i>e</i>-values for false discovery rate control","authors":"Zhimei Ren, Rina Foygel Barber","doi":"10.1093/jrsssb/qkad085","DOIUrl":"https://doi.org/10.1093/jrsssb/qkad085","url":null,"abstract":"Abstract Model-X knockoffs is a flexible wrapper method for high-dimensional regression algorithms, which provides guaranteed control of the false discovery rate (FDR). Due to the randomness inherent to the method, different runs of model-X knockoffs on the same dataset often result in different sets of selected variables, which is undesirable in practice. In this article, we introduce a methodology for derandomising model-X knockoffs with provable FDR control. The key insight of our proposed method lies in the discovery that the knockoffs procedure is in essence an e-BH procedure. We make use of this connection and derandomise model-X knockoffs by aggregating the e-values resulting from multiple knockoff realisations. We prove that the derandomised procedure controls the FDR at the desired level, without any additional conditions (in contrast, previously proposed methods for derandomisation are not able to guarantee FDR control). The proposed method is evaluated with numerical experiments, where we find that the derandomised procedure achieves comparable power and dramatically decreased selection variability when compared with model-X knockoffs.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136364059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Journal of the Royal Statistical Society Series B-Statistical Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1