DeepBlock: A Novel Blocking Approach for Entity Resolution using Deep Learning

Delaram Javdani, H. Rahmani, Milad Allahgholi, Fatemeh Karimkhani
{"title":"DeepBlock: A Novel Blocking Approach for Entity Resolution using Deep Learning","authors":"Delaram Javdani, H. Rahmani, Milad Allahgholi, Fatemeh Karimkhani","doi":"10.1109/ICWR.2019.8765267","DOIUrl":null,"url":null,"abstract":"Entity resolution refers to the process of identifying and integrating records belonging to unique entities. The standard methods are using a rule-based or machine learning models to compare and assign a point, to indicate the status of matching or non-matching the pair of records. However, a comprehensive comparison across all the records pairs leads to a second-order matching complexity. Therefore blocking methods are using before the matching, to group the same entities into small blocks. Then the matching operation is done comprehensively. Several blocking methods provided to efficiently block the input data into manageable groups, including the token blocking, that holds records with a similar token in the same block. Most of the previous methods did not take any semantic criteria into account. In this paper, we propose a new method, called DeepBlock that uses deep learning for the task of blocking in entity resolution. DeepBlock combines syntactic and semantic similarities to calculate the similarity between records. We have evaluated the DeepBlock over the real-world dataset and compared it with the existing blocking technique (token blocking). Our experimental result shows that the combination of semantic and syntactic similarity can considerably improve the quality of blocking. The results show that DeepBlock outperforms the token blocking method significantly with respect to pair quality (PQ) measure.","PeriodicalId":6680,"journal":{"name":"2019 5th International Conference on Web Research (ICWR)","volume":"54 1","pages":"41-44"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR.2019.8765267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Entity resolution refers to the process of identifying and integrating records belonging to unique entities. The standard methods are using a rule-based or machine learning models to compare and assign a point, to indicate the status of matching or non-matching the pair of records. However, a comprehensive comparison across all the records pairs leads to a second-order matching complexity. Therefore blocking methods are using before the matching, to group the same entities into small blocks. Then the matching operation is done comprehensively. Several blocking methods provided to efficiently block the input data into manageable groups, including the token blocking, that holds records with a similar token in the same block. Most of the previous methods did not take any semantic criteria into account. In this paper, we propose a new method, called DeepBlock that uses deep learning for the task of blocking in entity resolution. DeepBlock combines syntactic and semantic similarities to calculate the similarity between records. We have evaluated the DeepBlock over the real-world dataset and compared it with the existing blocking technique (token blocking). Our experimental result shows that the combination of semantic and syntactic similarity can considerably improve the quality of blocking. The results show that DeepBlock outperforms the token blocking method significantly with respect to pair quality (PQ) measure.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DeepBlock:一种基于深度学习的实体解析新方法
实体解析是指对属于唯一实体的记录进行识别和集成的过程。标准方法是使用基于规则或机器学习的模型来比较和分配一个点,以指示匹配或不匹配对记录的状态。但是,在所有记录对之间进行全面比较会导致二级匹配复杂性。因此,在匹配之前使用块方法,将相同的实体分组成小块。然后进行全面的匹配操作。提供了几种阻塞方法来有效地将输入数据阻塞到可管理的组中,包括令牌阻塞,它在同一块中保存具有类似令牌的记录。以前的方法大多不考虑语义标准。在本文中,我们提出了一种名为DeepBlock的新方法,该方法使用深度学习来完成实体解析中的阻塞任务。DeepBlock结合句法和语义相似性来计算记录之间的相似性。我们在真实世界的数据集上评估了DeepBlock,并将其与现有的阻塞技术(令牌阻塞)进行了比较。实验结果表明,语义相似度和句法相似度的结合可以显著提高分组的质量。结果表明,DeepBlock在对质量(PQ)度量方面明显优于令牌阻塞方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Anomaly-Based IDS for Detecting Attacks in RPL-Based Internet of Things A Sentiment Aggregation System based on an OWA Operator Using Web Mining in the Analysis of Housing Prices: A Case study of Tehran An Adaptive Machine Learning Based Approach for Phishing Detection Using Hybrid Features Mobility-Aware Parent Selection for Routing Protocol in Wireless Sensor Networks using RPL
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1