{"title":"Mixed Attributes Two-Stage-Clustering Entity Resolution","authors":"Lei Gang","doi":"10.17265/1548-7709/2015.06.003","DOIUrl":null,"url":null,"abstract":"Record matching and clustering are two essential steps in the process of entity resolution, and the single text similarity clustering based on tf-idf (term frequency-inverse document frequency) feature often leads to poor precision in spots entity resolution. The paper outlines a mixed attributes two-stage-clustering entity resolution framework (abbreviated in MATC-ER) and designs an approach to measure the similarity by mixing spot name and spot introduction, which makes good use of the record information at different stages. Then the paper proves its efficiency based on the comparative experiments on the real data of travel spots.","PeriodicalId":69156,"journal":{"name":"通讯和计算机:中英文版","volume":"12 1","pages":"297-302"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"通讯和计算机:中英文版","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.17265/1548-7709/2015.06.003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Record matching and clustering are two essential steps in the process of entity resolution, and the single text similarity clustering based on tf-idf (term frequency-inverse document frequency) feature often leads to poor precision in spots entity resolution. The paper outlines a mixed attributes two-stage-clustering entity resolution framework (abbreviated in MATC-ER) and designs an approach to measure the similarity by mixing spot name and spot introduction, which makes good use of the record information at different stages. Then the paper proves its efficiency based on the comparative experiments on the real data of travel spots.