GRASP:分类学习的拟合优度检验

IF 3.1 1区数学 Q1 STATISTICS & PROBABILITY Journal of the Royal Statistical Society Series B-Statistical Methodology Pub Date : 2023-09-23 DOI:10.1093/jrsssb/qkad106

Adel Javanmard, Mohammad Mehrabi

{"title":"GRASP:分类学习的拟合优度检验","authors":"Adel Javanmard, Mohammad Mehrabi","doi":"10.1093/jrsssb/qkad106","DOIUrl":null,"url":null,"abstract":"Abstract Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterising the fit of the model to the underlying conditional law of labels given the features vector (Y∣X), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law Y∣X and treats that as a black-box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form H0:E[Df(Bern(η(X))‖Bern(η^(X)))]≤τ where Df represents an f-divergence function, and η(x), η^(x), respectively, denote the true and an estimate likelihood for a feature vector x admitting a positive label. We propose a novel test, called Goodness-of-fit with Randomisation and Scoring Procedure (GRASP) for testing H0, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X GRASP designed for model-X settings where the joint distribution of the features vector is known. Model-X GRASP uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"43 1","pages":"0"},"PeriodicalIF":3.1000,"publicationDate":"2023-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GRASP: a goodness-of-fit test for classification learning\",\"authors\":\"Adel Javanmard, Mohammad Mehrabi\",\"doi\":\"10.1093/jrsssb/qkad106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterising the fit of the model to the underlying conditional law of labels given the features vector (Y∣X), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law Y∣X and treats that as a black-box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form H0:E[Df(Bern(η(X))‖Bern(η^(X)))]≤τ where Df represents an f-divergence function, and η(x), η^(x), respectively, denote the true and an estimate likelihood for a feature vector x admitting a positive label. We propose a novel test, called Goodness-of-fit with Randomisation and Scoring Procedure (GRASP) for testing H0, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X GRASP designed for model-X settings where the joint distribution of the features vector is known. Model-X GRASP uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.\",\"PeriodicalId\":49982,\"journal\":{\"name\":\"Journal of the Royal Statistical Society Series B-Statistical Methodology\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2023-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Royal Statistical Society Series B-Statistical Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jrsssb/qkad106\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Royal Statistical Society Series B-Statistical Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jrsssb/qkad106","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

摘要分类器的性能通常以测试数据的平均准确率来衡量。尽管是一种标准度量，但平均精度在描述模型与给定特征向量(Y∣X)的潜在标签条件律的拟合方面失败，例如由于模型规格错误，过度拟合和高维。在本文中，我们考虑了评估一般二分类器的拟合优度的基本问题。我们的框架没有对条件律Y∣X做任何参数假设，并将其视为只能通过查询访问的黑盒oracle模型。我们将拟合优良度评估问题表述为形式为H0的容差假设检验:E[Df(Bern(η(X))‖Bern(η^(X)))]≤τ，其中Df表示f-散度函数，η(X)， η^(X)分别表示承认正标签的特征向量X的真似然和估计似然。我们提出了一种新的测试，称为随机化和评分程序(GRASP)的拟合优度测试，用于测试H0，它适用于有限样本设置，无论特征(无分布)如何。我们还提出了针对已知特征向量联合分布的model-X设置设计的model-X GRASP。Model-X GRASP利用这种分布信息来获得更好的动力。我们通过大量的数值实验来评估测试的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GRASP: a goodness-of-fit test for classification learning

Abstract Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterising the fit of the model to the underlying conditional law of labels given the features vector (Y∣X), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law Y∣X and treats that as a black-box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form H0:E[Df(Bern(η(X))‖Bern(η^(X)))]≤τ where Df represents an f-divergence function, and η(x), η^(x), respectively, denote the true and an estimate likelihood for a feature vector x admitting a positive label. We propose a novel test, called Goodness-of-fit with Randomisation and Scoring Procedure (GRASP) for testing H0, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X GRASP designed for model-X settings where the joint distribution of the features vector is known. Model-X GRASP uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Royal Statistical Society Series B-Statistical Methodology 数学-统计学与概率论

CiteScore

8.80

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Series B (Statistical Methodology) aims to publish high quality papers on the methodological aspects of statistics and data science more broadly. The objective of papers should be to contribute to the understanding of statistical methodology and/or to develop and improve statistical methods; any mathematical theory should be directed towards these aims. The kinds of contribution considered include descriptions of new methods of collecting or analysing data, with the underlying theory, an indication of the scope of application and preferably a real example. Also considered are comparisons, critical evaluations and new applications of existing methods, contributions to probability theory which have a clear practical bearing (including the formulation and analysis of stochastic models), statistical computation or simulation where original methodology is involved and original contributions to the foundations of statistical science. Reviews of methodological techniques are also considered. A paper, even if correct and well presented, is likely to be rejected if it only presents straightforward special cases of previously published work, if it is of mathematical interest only, if it is too long in relation to the importance of the new material that it contains or if it is dominated by computations or simulations of a routine nature.