Delaram Javdani, H. Rahmani, Milad Allahgholi, Fatemeh Karimkhani
{"title":"DeepBlock: A Novel Blocking Approach for Entity Resolution using Deep Learning","authors":"Delaram Javdani, H. Rahmani, Milad Allahgholi, Fatemeh Karimkhani","doi":"10.1109/ICWR.2019.8765267","DOIUrl":null,"url":null,"abstract":"Entity resolution refers to the process of identifying and integrating records belonging to unique entities. The standard methods are using a rule-based or machine learning models to compare and assign a point, to indicate the status of matching or non-matching the pair of records. However, a comprehensive comparison across all the records pairs leads to a second-order matching complexity. Therefore blocking methods are using before the matching, to group the same entities into small blocks. Then the matching operation is done comprehensively. Several blocking methods provided to efficiently block the input data into manageable groups, including the token blocking, that holds records with a similar token in the same block. Most of the previous methods did not take any semantic criteria into account. In this paper, we propose a new method, called DeepBlock that uses deep learning for the task of blocking in entity resolution. DeepBlock combines syntactic and semantic similarities to calculate the similarity between records. We have evaluated the DeepBlock over the real-world dataset and compared it with the existing blocking technique (token blocking). Our experimental result shows that the combination of semantic and syntactic similarity can considerably improve the quality of blocking. The results show that DeepBlock outperforms the token blocking method significantly with respect to pair quality (PQ) measure.","PeriodicalId":6680,"journal":{"name":"2019 5th International Conference on Web Research (ICWR)","volume":"54 1","pages":"41-44"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR.2019.8765267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Entity resolution refers to the process of identifying and integrating records belonging to unique entities. The standard methods are using a rule-based or machine learning models to compare and assign a point, to indicate the status of matching or non-matching the pair of records. However, a comprehensive comparison across all the records pairs leads to a second-order matching complexity. Therefore blocking methods are using before the matching, to group the same entities into small blocks. Then the matching operation is done comprehensively. Several blocking methods provided to efficiently block the input data into manageable groups, including the token blocking, that holds records with a similar token in the same block. Most of the previous methods did not take any semantic criteria into account. In this paper, we propose a new method, called DeepBlock that uses deep learning for the task of blocking in entity resolution. DeepBlock combines syntactic and semantic similarities to calculate the similarity between records. We have evaluated the DeepBlock over the real-world dataset and compared it with the existing blocking technique (token blocking). Our experimental result shows that the combination of semantic and syntactic similarity can considerably improve the quality of blocking. The results show that DeepBlock outperforms the token blocking method significantly with respect to pair quality (PQ) measure.