{"title":"Classification Under Partial Reject Options","authors":"Måns Karlsson, Ola Hössjer","doi":"10.1007/s00357-023-09455-x","DOIUrl":null,"url":null,"abstract":"<p>In many applications there is ambiguity about which (if any) of a finite number <i>N</i> of hypotheses that best fits an observation. It is of interest then to possibly output a whole set of categories, that is, a scenario where the size of the classified set of categories ranges from 0 to <i>N</i>. Empty sets correspond to an outlier, sets of size 1 represent a firm decision that singles out one hypothesis, sets of size <i>N</i> correspond to a rejection to classify, whereas sets of sizes <span>\\(2,\\ldots ,N-1\\)</span> represent a partial rejection to classify, where some hypotheses are excluded from further analysis. In this paper, we review and unify several proposed methods of Bayesian set-valued classification, where the objective is to find the optimal Bayesian classifier that maximizes the expected reward. We study a large class of reward functions with rewards for sets that include the true category, whereas additive or multiplicative penalties are incurred for sets depending on their size. For models with one homogeneous block of hypotheses, we provide general expressions for the accompanying Bayesian classifier, several of which extend previous results in the literature. Then, we derive novel results for the more general setting when hypotheses are partitioned into blocks, where ambiguity within and between blocks are of different severity. We also discuss how well-known methods of classification, such as conformal prediction, indifference zones, and hierarchical classification, fit into our framework. Finally, set-valued classification is illustrated using an ornithological data set, with taxa partitioned into blocks and parameters estimated using MCMC. The associated reward function’s tuning parameters are chosen through cross-validation.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"29 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Classification","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00357-023-09455-x","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
In many applications there is ambiguity about which (if any) of a finite number N of hypotheses that best fits an observation. It is of interest then to possibly output a whole set of categories, that is, a scenario where the size of the classified set of categories ranges from 0 to N. Empty sets correspond to an outlier, sets of size 1 represent a firm decision that singles out one hypothesis, sets of size N correspond to a rejection to classify, whereas sets of sizes \(2,\ldots ,N-1\) represent a partial rejection to classify, where some hypotheses are excluded from further analysis. In this paper, we review and unify several proposed methods of Bayesian set-valued classification, where the objective is to find the optimal Bayesian classifier that maximizes the expected reward. We study a large class of reward functions with rewards for sets that include the true category, whereas additive or multiplicative penalties are incurred for sets depending on their size. For models with one homogeneous block of hypotheses, we provide general expressions for the accompanying Bayesian classifier, several of which extend previous results in the literature. Then, we derive novel results for the more general setting when hypotheses are partitioned into blocks, where ambiguity within and between blocks are of different severity. We also discuss how well-known methods of classification, such as conformal prediction, indifference zones, and hierarchical classification, fit into our framework. Finally, set-valued classification is illustrated using an ornithological data set, with taxa partitioned into blocks and parameters estimated using MCMC. The associated reward function’s tuning parameters are chosen through cross-validation.
期刊介绍:
To publish original and valuable papers in the field of classification, numerical taxonomy, multidimensional scaling and other ordination techniques, clustering, tree structures and other network models (with somewhat less emphasis on principal components analysis, factor analysis, and discriminant analysis), as well as associated models and algorithms for fitting them. Articles will support advances in methodology while demonstrating compelling substantive applications. Comprehensive review articles are also acceptable. Contributions will represent disciplines such as statistics, psychology, biology, information retrieval, anthropology, archeology, astronomy, business, chemistry, computer science, economics, engineering, geography, geology, linguistics, marketing, mathematics, medicine, political science, psychiatry, sociology, and soil science.