An Intrinsically Interpretable Entity Matching System

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI:10.48786/edbt.2023.54

Andrea Baraldi, Francesco Del Buono, Francesco Guerra, Matteo Paganelli, M. Vincini

{"title":"An Intrinsically Interpretable Entity Matching System","authors":"Andrea Baraldi, Francesco Del Buono, Francesco Guerra, Matteo Paganelli, M. Vincini","doi":"10.48786/edbt.2023.54","DOIUrl":null,"url":null,"abstract":"Explainable classification systems generate predictions along with a weight for each term in the input record measuring its contribution to the prediction. In the entity matching (EM) scenario, inputs are pairs of entity descriptions and the resulting explanations can be difficult to understand for the users. They can be very long and assign different impacts to similar terms located in different descriptions. To address these issues, we introduce the concept of decision units, i.e., basic information units formed either by pairs of (similar) terms, each one belonging to a different entity description, or unique terms, existing in one of the descriptions only. Decision units form a new feature space, able to represent, in a compact and meaningful way, pairs of entity descriptions. An explainable model trained on such features generates effective explanations customized for EM datasets. In this paper, we propose this idea via a three-component architecture template, which consists of a decision unit generator, a decision unit scorer, and an explainable matcher. Then, we introduce WYM (Why do You Match?), an implementation of the architecture oriented to textual EM databases. The experiments show that our approach has accuracy comparable to other state-of-the-art Deep Learning based EM models, but, differently from them, its predictions are highly interpretable.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"31 1","pages":"645-657"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2023.54","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Explainable classification systems generate predictions along with a weight for each term in the input record measuring its contribution to the prediction. In the entity matching (EM) scenario, inputs are pairs of entity descriptions and the resulting explanations can be difficult to understand for the users. They can be very long and assign different impacts to similar terms located in different descriptions. To address these issues, we introduce the concept of decision units, i.e., basic information units formed either by pairs of (similar) terms, each one belonging to a different entity description, or unique terms, existing in one of the descriptions only. Decision units form a new feature space, able to represent, in a compact and meaningful way, pairs of entity descriptions. An explainable model trained on such features generates effective explanations customized for EM datasets. In this paper, we propose this idea via a three-component architecture template, which consists of a decision unit generator, a decision unit scorer, and an explainable matcher. Then, we introduce WYM (Why do You Match?), an implementation of the architecture oriented to textual EM databases. The experiments show that our approach has accuracy comparable to other state-of-the-art Deep Learning based EM models, but, differently from them, its predictions are highly interpretable.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一个内在可解释的实体匹配系统

可解释的分类系统生成预测，并为输入记录中的每个术语提供权重，以衡量其对预测的贡献。在实体匹配(EM)场景中，输入是成对的实体描述，结果的解释对于用户来说可能很难理解。它们可以很长，并将不同的影响分配给位于不同描述中的相似术语。为了解决这些问题，我们引入了决策单元的概念，即，基本信息单元由(相似的)术语对组成，每个术语属于不同的实体描述，或者唯一的术语，只存在于一个描述中。决策单元形成一个新的特征空间，能够以紧凑和有意义的方式表示成对的实体描述。在这些特征上训练的可解释模型生成针对EM数据集定制的有效解释。在本文中，我们通过一个三组件架构模板提出了这个想法，该模板由决策单元生成器、决策单元评分器和可解释的匹配器组成。然后，我们介绍了WYM (Why do You Match?)，一种面向文本EM数据库的体系结构实现。实验表明，我们的方法具有与其他最先进的基于深度学习的EM模型相当的准确性，但是，与它们不同的是，它的预测是高度可解释的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Advances in database technology : proceedings. International Conference on Extending Database Technology

自引率

0.00%

发文量

期刊最新文献

Computing Generic Abstractions from Application Datasets Fair Spatial Indexing: A paradigm for Group Spatial Fairness. Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach Auditing for Spatial Fairness TransEdge: Supporting Efficient Read Queries Across Untrusted Edge Nodes