多类混合模型分类规则错误率控制。

Pub Date : 2022-11-01 DOI:10.1515/ijb-2020-0105

Tristan Mary-Huard, Vittorio Perduca, Marie-Laure Martin-Magniette, Gilles Blanchard

{"title":"多类混合模型分类规则错误率控制。","authors":"Tristan Mary-Huard, Vittorio Perduca, Marie-Laure Martin-Magniette, Gilles Blanchard","doi":"10.1515/ijb-2020-0105","DOIUrl":null,"url":null,"abstract":"In the context of finite mixture models one considers the problem of classifying as many observations as possible in the classes of interest while controlling the classification error rate in these same classes. Similar to what is done in the framework of statistical test theory, different type I and type II-like classification error rates can be defined, along with their associated optimal rules, where optimality is defined as minimizing type II error rate while controlling type I error rate at some nominal level. It is first shown that finding an optimal classification rule boils down to searching an optimal region in the observation space where to apply the classical Maximum A Posteriori (MAP) rule. Depending on the misclassification rate to be controlled, the shape of the optimal region is provided, along with a heuristic to compute the optimal classification rule in practice. In particular, a multiclass FDR-like optimal rule is defined and compared to the thresholded MAP rules that is used in most applications. It is shown on both simulated and real datasets that the FDR-like optimal rule may be significantly less conservative than the thresholded MAP rule.","PeriodicalId":75022,"journal":{"name":"","volume":"18 2","pages":"381-396"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Error rate control for classification rules in multiclass mixture models.\",\"authors\":\"Tristan Mary-Huard, Vittorio Perduca, Marie-Laure Martin-Magniette, Gilles Blanchard\",\"doi\":\"10.1515/ijb-2020-0105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the context of finite mixture models one considers the problem of classifying as many observations as possible in the classes of interest while controlling the classification error rate in these same classes. Similar to what is done in the framework of statistical test theory, different type I and type II-like classification error rates can be defined, along with their associated optimal rules, where optimality is defined as minimizing type II error rate while controlling type I error rate at some nominal level. It is first shown that finding an optimal classification rule boils down to searching an optimal region in the observation space where to apply the classical Maximum A Posteriori (MAP) rule. Depending on the misclassification rate to be controlled, the shape of the optimal region is provided, along with a heuristic to compute the optimal classification rule in practice. In particular, a multiclass FDR-like optimal rule is defined and compared to the thresholded MAP rules that is used in most applications. It is shown on both simulated and real datasets that the FDR-like optimal rule may be significantly less conservative than the thresholded MAP rule.\",\"PeriodicalId\":75022,\"journal\":{\"name\":\"\",\"volume\":\"18 2\",\"pages\":\"381-396\"},\"PeriodicalIF\":0.0,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1515/ijb-2020-0105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/ijb-2020-0105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在有限混合模型的背景下，人们考虑在感兴趣的类别中分类尽可能多的观测值的问题，同时控制这些相同类别中的分类错误率。与统计检验理论框架中所做的类似，可以定义不同的I类和II类分类错误率，以及它们相关的最优规则，其中最优性定义为最小化II类错误率，同时将I类错误率控制在某个名义水平上。首先表明，寻找最优分类规则归结为在观测空间中寻找一个最优区域，在该区域中应用经典的最大后验A (MAP)规则。根据待控制的分类错误率，给出了最优区域的形状，并给出了在实践中计算最优分类规则的启发式算法。特别地，定义了一个多类类似fdr的最优规则，并与大多数应用程序中使用的阈值MAP规则进行了比较。在模拟和实际数据集上都表明，类fdr最优规则的保守性明显低于阈值MAP规则。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Error rate control for classification rules in multiclass mixture models.

In the context of finite mixture models one considers the problem of classifying as many observations as possible in the classes of interest while controlling the classification error rate in these same classes. Similar to what is done in the framework of statistical test theory, different type I and type II-like classification error rates can be defined, along with their associated optimal rules, where optimality is defined as minimizing type II error rate while controlling type I error rate at some nominal level. It is first shown that finding an optimal classification rule boils down to searching an optimal region in the observation space where to apply the classical Maximum A Posteriori (MAP) rule. Depending on the misclassification rate to be controlled, the shape of the optimal region is provided, along with a heuristic to compute the optimal classification rule in practice. In particular, a multiclass FDR-like optimal rule is defined and compared to the thresholded MAP rules that is used in most applications. It is shown on both simulated and real datasets that the FDR-like optimal rule may be significantly less conservative than the thresholded MAP rule.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助