机器学习模型的可解释性查询及其与数据管理问题的联系(特邀演讲)

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2021-01-01 DOI:10.4230/LIPIcs.ICDT.2021.1

P. Barceló

{"title":"机器学习模型的可解释性查询及其与数据管理问题的联系(特邀演讲)","authors":"P. Barceló","doi":"10.4230/LIPIcs.ICDT.2021.1","DOIUrl":null,"url":null,"abstract":"In this talk I will present two recent examples of my research on explainability problems over machine learning (ML) models. In rough terms, these explainability problems deal with specific queries one poses over a ML model in order to obtain meaningful justifications for their results. Both of the examples I will present deal with “local” and “post-hoc” explainability queries. Here “local” means that we intend to explain the output of the ML model for a particular input, while “post-hoc” refers to the fact that the explanation is obtained after the model is trained. In the process I will also establish connections with problems studied in data management. This with the intention of suggesting new possibilities for cross-fertilization between the area and ML. The first example I will present refers to computing explanations with scores based on Shapley values, in particular with the recently proposed, and already influential, SHAP-score. This score provides a measure of how different features in the input contribute to the output of the ML model. We provide a detailed analysis of the complexity of this problem for different classes of Boolean circuits. In particular, we show that the problem of computing SHAP-scores is tractable as long as the circuit is deterministic and decomposable, but becomes computationally hard if any of these restrictions is lifted. The tractability part of this result provides a generalization of a recent result stating that, for Boolean hierarchical conjunctive queries, the Shapley-value of the contribution of a tuple in the database to the final result can be computed in polynomial time. The second example I will present refers to the comparison of different ML models in terms of important families of (local and post-hoc) explainability queries. For the models, I will consider multilayer perceptrons and binary decision diagrams. The main object of study will be the computational complexity of the aforementioned queries over such models. The obtained results will show an interesting theoretical counterpart to wisdom’s claims on interpretability. This work also suggests the need for developing query languages that support the process of retrieving explanations from ML models, and also for obtaining general tractability results for such languages over specific classes of models. 2012 ACM Subject Classification Theory of computation → Models of learning","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explainability Queries for ML Models and its Connections with Data Management Problems (Invited Talk)\",\"authors\":\"P. Barceló\",\"doi\":\"10.4230/LIPIcs.ICDT.2021.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this talk I will present two recent examples of my research on explainability problems over machine learning (ML) models. In rough terms, these explainability problems deal with specific queries one poses over a ML model in order to obtain meaningful justifications for their results. Both of the examples I will present deal with “local” and “post-hoc” explainability queries. Here “local” means that we intend to explain the output of the ML model for a particular input, while “post-hoc” refers to the fact that the explanation is obtained after the model is trained. In the process I will also establish connections with problems studied in data management. This with the intention of suggesting new possibilities for cross-fertilization between the area and ML. The first example I will present refers to computing explanations with scores based on Shapley values, in particular with the recently proposed, and already influential, SHAP-score. This score provides a measure of how different features in the input contribute to the output of the ML model. We provide a detailed analysis of the complexity of this problem for different classes of Boolean circuits. In particular, we show that the problem of computing SHAP-scores is tractable as long as the circuit is deterministic and decomposable, but becomes computationally hard if any of these restrictions is lifted. The tractability part of this result provides a generalization of a recent result stating that, for Boolean hierarchical conjunctive queries, the Shapley-value of the contribution of a tuple in the database to the final result can be computed in polynomial time. The second example I will present refers to the comparison of different ML models in terms of important families of (local and post-hoc) explainability queries. For the models, I will consider multilayer perceptrons and binary decision diagrams. The main object of study will be the computational complexity of the aforementioned queries over such models. The obtained results will show an interesting theoretical counterpart to wisdom’s claims on interpretability. This work also suggests the need for developing query languages that support the process of retrieving explanations from ML models, and also for obtaining general tractability results for such languages over specific classes of models. 2012 ACM Subject Classification Theory of computation → Models of learning\",\"PeriodicalId\":90482,\"journal\":{\"name\":\"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.ICDT.2021.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.ICDT.2021.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在这次演讲中，我将展示我最近对机器学习(ML)模型的可解释性问题的研究的两个例子。粗略地说，这些可解释性问题处理对ML模型提出的特定查询，以便为其结果获得有意义的证明。我将提供的两个示例都处理“本地”和“事后”可解释性查询。这里的“局部”意味着我们打算解释特定输入的ML模型的输出，而“post-hoc”指的是在模型训练之后获得解释。在这个过程中，我也会与数据管理中研究的问题建立联系。这样做的目的是提出该地区和ML之间交叉受精的新可能性。我将提出的第一个例子是基于Shapley值的分数计算解释，特别是最近提出的，已经很有影响力的shap分数。这个分数提供了输入中的不同特征对ML模型输出的贡献程度的度量。针对不同类型的布尔电路，详细分析了该问题的复杂性。特别是，我们表明，只要电路是确定的和可分解的，计算shap分数的问题是可处理的，但如果取消这些限制中的任何一个，计算就会变得困难。该结果的可跟踪性部分提供了最近一个结果的泛化，该结果表明，对于布尔层次合取查询，数据库中元组对最终结果的贡献的shapley值可以在多项式时间内计算。我将给出的第二个例子是根据重要的(本地的和事后的)可解释性查询族对不同ML模型的比较。对于模型，我将考虑多层感知器和二元决策图。研究的主要对象将是上述查询在这些模型上的计算复杂性。获得的结果将显示一个有趣的理论对应物，智慧的可解释性要求。这项工作还表明，需要开发查询语言来支持从ML模型中检索解释的过程，以及在特定类型的模型上获得这些语言的一般可跟踪性结果。2012 ACM学科分类:计算理论→学习模型

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Explainability Queries for ML Models and its Connections with Data Management Problems (Invited Talk)

In this talk I will present two recent examples of my research on explainability problems over machine learning (ML) models. In rough terms, these explainability problems deal with specific queries one poses over a ML model in order to obtain meaningful justifications for their results. Both of the examples I will present deal with “local” and “post-hoc” explainability queries. Here “local” means that we intend to explain the output of the ML model for a particular input, while “post-hoc” refers to the fact that the explanation is obtained after the model is trained. In the process I will also establish connections with problems studied in data management. This with the intention of suggesting new possibilities for cross-fertilization between the area and ML. The first example I will present refers to computing explanations with scores based on Shapley values, in particular with the recently proposed, and already influential, SHAP-score. This score provides a measure of how different features in the input contribute to the output of the ML model. We provide a detailed analysis of the complexity of this problem for different classes of Boolean circuits. In particular, we show that the problem of computing SHAP-scores is tractable as long as the circuit is deterministic and decomposable, but becomes computationally hard if any of these restrictions is lifted. The tractability part of this result provides a generalization of a recent result stating that, for Boolean hierarchical conjunctive queries, the Shapley-value of the contribution of a tuple in the database to the final result can be computed in polynomial time. The second example I will present refers to the comparison of different ML models in terms of important families of (local and post-hoc) explainability queries. For the models, I will consider multilayer perceptrons and binary decision diagrams. The main object of study will be the computational complexity of the aforementioned queries over such models. The obtained results will show an interesting theoretical counterpart to wisdom’s claims on interpretability. This work also suggests the need for developing query languages that support the process of retrieving explanations from ML models, and also for obtaining general tractability results for such languages over specific classes of models. 2012 ACM Subject Classification Theory of computation → Models of learning

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

自引率

0.00%

发文量

期刊最新文献

Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs A Simple Algorithm for Consistent Query Answering under Primary Keys Size Bounds and Algorithms for Conjunctive Regular Path Queries Compact Data Structures Meet Databases (Invited Talk) Enumerating Subgraphs of Constant Sizes in External Memory