{"title":"Unsupervised Measuring of Entity Resolution Consistency","authors":"Jeffrey Fisher, Qing Wang","doi":"10.1109/ICDMW.2015.162","DOIUrl":null,"url":null,"abstract":"Entity resolution (ER) is a common data cleaning and data-integration task that aims to determine which records in one or more data sets refer to the same real-world entities. In most cases no training data exists and the ER process involves considerable trial and error, with an often time-consuming manual evaluation required to determine whether the obtained results are good enough. We propose a method that makes use of transitive closure within triples of records to provide an early indication of inconsistency in an ER result in an unsupervised fashion. We test our approach on three real-world data sets with different similarity calculations and blocking approaches and show that our approach can detect problems with ER resultsearly on without a manual evaluation.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2015.162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Entity resolution (ER) is a common data cleaning and data-integration task that aims to determine which records in one or more data sets refer to the same real-world entities. In most cases no training data exists and the ER process involves considerable trial and error, with an often time-consuming manual evaluation required to determine whether the obtained results are good enough. We propose a method that makes use of transitive closure within triples of records to provide an early indication of inconsistency in an ER result in an unsupervised fashion. We test our approach on three real-world data sets with different similarity calculations and blocking approaches and show that our approach can detect problems with ER resultsearly on without a manual evaluation.