Andrea Baraldi, Francesco Del Buono, Francesco Guerra, Matteo Paganelli, M. Vincini
{"title":"An Intrinsically Interpretable Entity Matching System","authors":"Andrea Baraldi, Francesco Del Buono, Francesco Guerra, Matteo Paganelli, M. Vincini","doi":"10.48786/edbt.2023.54","DOIUrl":null,"url":null,"abstract":"Explainable classification systems generate predictions along with a weight for each term in the input record measuring its contribution to the prediction. In the entity matching (EM) scenario, inputs are pairs of entity descriptions and the resulting explanations can be difficult to understand for the users. They can be very long and assign different impacts to similar terms located in different descriptions. To address these issues, we introduce the concept of decision units, i.e., basic information units formed either by pairs of (similar) terms, each one belonging to a different entity description, or unique terms, existing in one of the descriptions only. Decision units form a new feature space, able to represent, in a compact and meaningful way, pairs of entity descriptions. An explainable model trained on such features generates effective explanations customized for EM datasets. In this paper, we propose this idea via a three-component architecture template, which consists of a decision unit generator, a decision unit scorer, and an explainable matcher. Then, we introduce WYM (Why do You Match?), an implementation of the architecture oriented to textual EM databases. The experiments show that our approach has accuracy comparable to other state-of-the-art Deep Learning based EM models, but, differently from them, its predictions are highly interpretable.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"31 1","pages":"645-657"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2023.54","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Explainable classification systems generate predictions along with a weight for each term in the input record measuring its contribution to the prediction. In the entity matching (EM) scenario, inputs are pairs of entity descriptions and the resulting explanations can be difficult to understand for the users. They can be very long and assign different impacts to similar terms located in different descriptions. To address these issues, we introduce the concept of decision units, i.e., basic information units formed either by pairs of (similar) terms, each one belonging to a different entity description, or unique terms, existing in one of the descriptions only. Decision units form a new feature space, able to represent, in a compact and meaningful way, pairs of entity descriptions. An explainable model trained on such features generates effective explanations customized for EM datasets. In this paper, we propose this idea via a three-component architecture template, which consists of a decision unit generator, a decision unit scorer, and an explainable matcher. Then, we introduce WYM (Why do You Match?), an implementation of the architecture oriented to textual EM databases. The experiments show that our approach has accuracy comparable to other state-of-the-art Deep Learning based EM models, but, differently from them, its predictions are highly interpretable.
可解释的分类系统生成预测,并为输入记录中的每个术语提供权重,以衡量其对预测的贡献。在实体匹配(EM)场景中,输入是成对的实体描述,结果的解释对于用户来说可能很难理解。它们可以很长,并将不同的影响分配给位于不同描述中的相似术语。为了解决这些问题,我们引入了决策单元的概念,即,基本信息单元由(相似的)术语对组成,每个术语属于不同的实体描述,或者唯一的术语,只存在于一个描述中。决策单元形成一个新的特征空间,能够以紧凑和有意义的方式表示成对的实体描述。在这些特征上训练的可解释模型生成针对EM数据集定制的有效解释。在本文中,我们通过一个三组件架构模板提出了这个想法,该模板由决策单元生成器、决策单元评分器和可解释的匹配器组成。然后,我们介绍了WYM (Why do You Match?),一种面向文本EM数据库的体系结构实现。实验表明,我们的方法具有与其他最先进的基于深度学习的EM模型相当的准确性,但是,与它们不同的是,它的预测是高度可解释的。