Certifiable Robustness for Nearest Neighbor Classifiers

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-01-13 DOI:10.4230/LIPIcs.ICDT.2022.6

Austen Z. Fan, Paraschos Koutris

{"title":"Certifiable Robustness for Nearest Neighbor Classifiers","authors":"Austen Z. Fan, Paraschos Koutris","doi":"10.4230/LIPIcs.ICDT.2022.6","DOIUrl":null,"url":null,"abstract":"ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, $k$-Nearest Neighbors ($k$-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"25 1","pages":"6:1-6:20"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.ICDT.2022.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, $k$-Nearest Neighbors ($k$-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

最近邻分类器的可认证鲁棒性

机器学习模型通常使用高质量的大型数据集进行训练。然而，训练数据集经常包含不一致或不完整的数据。为了解决这个问题，一个解决方案是开发一种算法，可以检查模型的预测是否可以证明是鲁棒的。给定一个生成分类器的学习算法，并在测试时给出一个示例，如果在不确定(不一致)数据集的所有可能世界(修复)中训练的每个模型都能预测到分类结果，则可以证明分类结果是鲁棒的。这种稳健性的概念自然属于某些答案的框架。在本文中，我们研究了一个简单但广泛应用的分类算法，$k$-最近邻($k$-NN)的鲁棒性证明的复杂性。当完整性约束是功能依赖(fd)时，我们主要关注不一致的数据集。对于这种情况，我们在证明鲁棒性的复杂度上建立了一个二分法:问题要么允许多项式时间算法，要么是coNP-hard。此外，我们对问题的计数版本展示了类似的二分法，其目标是计算预测某个标签的可能世界的数量。作为我们研究的副产品，我们还建立了一个与寻找最佳子集修复相关的问题的复杂性，这可能是独立的兴趣。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

自引率

0.00%

发文量

期刊最新文献

Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs A Simple Algorithm for Consistent Query Answering under Primary Keys Size Bounds and Algorithms for Conjunctive Regular Path Queries Compact Data Structures Meet Databases (Invited Talk) Enumerating Subgraphs of Constant Sizes in External Memory